Re: [PATCH] drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

2024-05-02 Thread James Zhu



On 2024-05-01 18:56, Philip Yang wrote:

On system with khugepaged enabled and user cases with THP buffer, the
hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary
timeout value is not accurate, cause memory allocation failure.

Remove the arbitrary timeout value, return EAGAIN to application if
hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call
ioctl again.

Change EAGAIN to debug message as this is not error.

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  5 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c  | 12 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c |  5 +
  3 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 54198c3928c7..02696c2102f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1087,7 +1087,10 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
user_addr,
  
  	ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages, );

if (ret) {
-   pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
+   if (ret == -EAGAIN)
+   pr_debug("Failed to get user pages, try again\n");
+   else
+   pr_err("%s: Failed to get user pages: %d\n", __func__, 
ret);
goto unregister_out;
}
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

index 431ec72655ec..e36fede7f74c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -202,20 +202,12 @@ int amdgpu_hmm_range_get_pages(struct 
mmu_interval_notifier *notifier,
pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
hmm_range->start, hmm_range->end);
  
-		/* Assuming 64MB takes maximum 1 second to fault page address */

-   timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL);
-   timeout *= HMM_RANGE_DEFAULT_TIMEOUT;
-   timeout = jiffies + msecs_to_jiffies(timeout);
+   timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);

[JZ] should we reduce MAX_WALK_BYTE to 64M in the meantime?
  
  retry:

hmm_range->notifier_seq = mmu_interval_read_begin(notifier);
r = hmm_range_fault(hmm_range);
if (unlikely(r)) {
-   schedule();

[JZ] the above is for CPU stall WA, we may still need keep it.

-   /*
-* FIXME: This timeout should encompass the retry from
-* mmu_interval_read_retry() as well.
-*/
if (r == -EBUSY && !time_after(jiffies, timeout))
goto retry;
goto out_free_pfns;
@@ -247,6 +239,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
  out_free_range:
kfree(hmm_range);
  
+	if (r == -EBUSY)

+   r = -EAGAIN;
return r;
  }
  
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

index 94f83be2232d..e7040f809f33 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1670,11 +1670,8 @@ static int svm_range_validate_and_map(struct mm_struct 
*mm,
   readonly, owner, NULL,
   _range);
WRITE_ONCE(p->svms.faulting_task, NULL);
-   if (r) {
+   if (r)
pr_debug("failed %d to get svm range pages\n", 
r);
-   if (r == -EBUSY)
-   r = -EAGAIN;
-   }
} else {
r = -EFAULT;
}


Re: [PATCH] drm/amd/amdxcp: Use unique name for partition dev

2024-04-30 Thread James Zhu

On 2024-04-30 07:36, Lijo Lazar wrote:

amdxcp is a platform driver for creating partition devices. libdrm
library identifies a platform device based on 'OF_FULLNAME' or
'MODALIAS'. If two or more devices have the same platform name, drm
library only picks the first device. Platform driver core uses name of
the device to populate 'MODALIAS'. When 'amdxcp' is used as the base
name, only first partition device gets identified. Assign unique name so
that drm library identifies partition devices separately.

amdxcp doesn't support probe of partitions, it doesn't bother about
modaliases.

Signed-off-by: Lijo Lazar


Acked-by:JamesZhu


---
  drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c 
b/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c
index 90ddd8371176..b4131053b31b 100644
--- a/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c
+++ b/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c
@@ -50,12 +50,14 @@ int amdgpu_xcp_drm_dev_alloc(struct drm_device **ddev)
  {
struct platform_device *pdev;
struct xcp_device *pxcp_dev;
+   char dev_name[20];
int ret;
  
  	if (pdev_num >= MAX_XCP_PLATFORM_DEVICE)

return -ENODEV;
  
-	pdev = platform_device_register_simple("amdgpu_xcp", pdev_num, NULL, 0);

+   snprintf(dev_name, sizeof(dev_name), "amdgpu_xcp_%d", pdev_num);
+   pdev = platform_device_register_simple(dev_name, -1, NULL, 0);
if (IS_ERR(pdev))
return PTR_ERR(pdev);
  

Re: [PATCH v4 00/24] Support Host Trap Sampling for gfx941/gfx942

2024-02-12 Thread James Zhu

Ping .

Best Regards!

James Zhu

On 2024-02-06 10:58, James Zhu wrote:

PC sampling is a form of software profiling, where the threads of an application
are periodically interrupted and the program counter that the threads are 
currently
attempting to execute is saved out for profiling.

David Yat Sin (5):
   drm/amdkfd/kfd_ioctl: add pc sampling support
   drm/amdkfd: add pc sampling support
   drm/amdkfd: enable pc sampling query
   drm/amdkfd: enable pc sampling create
   drm/amdkfd: Set debug trap bit when enabling PC Sampling

James Zhu (19):
   drm/amdkfd: add pc sampling mutex
   drm/amdkfd: add trace_id return
   drm/amdkfd: check pcs_entry valid
   drm/amdkfd: enable pc sampling destroy
   drm/amdkfd: add interface to trigger pc sampling trap
   drm/amdkfd: trigger pc sampling trap for gfx v9
   drm/amdkfd/gfx9: enable host trap
   drm/amdgpu: use trapID 4 for host trap
   drm/amdgpu: add sq host trap status check
   drm/amdkfd: trigger pc sampling trap for arcturus
   drm/amdkfd: trigger pc sampling trap for aldebaran
   drm/amdkfd: use bit operation set debug trap
   drm/amdkfd: add setting trap pc sampling flag
   drm/amdkfd: enable pc sampling stop
   drm/amdkfd: add queue remapping
   drm/amdkfd: enable pc sampling start
   drm/amdkfd: add pc sampling thread to trigger trap
   drm/amdkfd: add pc sampling release when process release
   drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

  .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   11 +
  .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   14 +-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   73 +
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 +
  drivers/gpu/drm/amd/amdkfd/Makefile   |3 +-
  .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 +
  .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |   29 +-
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   75 +-
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c|   26 +
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h|3 +
  drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   14 +
  .../drm/amd/amdkfd/kfd_device_queue_manager.c |   11 +
  .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 +
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  426 
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h  |   35 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   46 +
  drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   32 +-
  .../amd/include/asic_reg/gc/gc_9_0_offset.h   |2 +
  .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |5 +
  .../gpu/drm/amd/include/kgd_kfd_interface.h   |7 +
  include/uapi/linux/kfd_ioctl.h|   64 +-
  21 files changed, 1914 insertions(+), 1080 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h



[PATCH v4 12/24] drm/amdgpu: use trapID 4 for host trap

2024-02-06 Thread James Zhu
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use
TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify
the host trap.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 +
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 +
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |5 +
 3 files changed, 1070 insertions(+), 1054 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 7d8c0e13ac12..adfe5e5585e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1162,6 +1162,8 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct 
amdgpu_device *adev,
value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd);
/* select *target_wave_slot */
value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, 
(*target_wave_slot)++);
+   /* set TrapID 4 for HOSTTRAP */
+   value = REG_SET_FIELD(value, SQ_CMD, DATA, 0x4);
 
mutex_lock(>grbm_idx_mutex);
amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x, 0);
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index af1f678790e7..b3c681d7256b 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,155 +274,263 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf82025e,
+   0xbf820001, 0xbf820263,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
-   0x00ff, 0xbf85001e,
+   0x00ff, 0xbf850023,
0x866eff7b, 0x0400,
-   0xbf85005b, 0xbf8e0010,
+   0xbf850060, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
-   0xbf850015, 0x866eff7b,
-   0x71ff, 0xbf840008,
-   0x866fff7b, 0x7080,
-   0xbf840001, 0xbeee1a87,
-   0xb8eff801, 0x8e6e8c6e,
-   0x866e6f6e, 0xbf85000a,
-   0x866eff6d, 0x00ff,
-   0xbf850007, 0xb8eef801,
-   0x866eff6e, 0x0800,
-   0xbf850003, 0x866eff7b,
-   0x0400, 0xbf850040,
-   0xb8faf807, 0x867aff7a,
-   0x001f8000, 0x8e7a8b7a,
-   0x8977ff77, 0xfc00,
-   0x8a77, 0xba7ff807,
-   0x, 0xb8faf812,
-   0xb8fbf813, 0x8efa887a,
-   0xbf0d8f7b, 0xbf840002,
-   0x877bff7b, 0x,
-   0xc0031c3d, 0x0010,
-   0xc0071bbd, 0x,
-   0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x8671ff6d,
-   0x0100, 0xbf840004,
-   0x92f1ff70, 0x00010001,
-   0xbf840016, 0xbf820005,
-   0x86708170, 0x8e709770,
-   0x8977ff77, 0x0080,
-   0x8077, 0x86ee6e6e,
-   0xbf840001, 0xbe801d6e,
-   0x866eff6d, 0x01ff,
-   0xbf850005, 0x8778ff78,
-   0x2000, 0x80ec886c,
-   0x82ed806d, 0xbf820005,
-   0x866eff6d, 0x0100,
-   0xbf850002, 0x806c846c,
-   0x826d806d, 0x866dff6d,
-   0x, 0x8f7a8b77,
+   0xbf85001a, 0x866eff6d,
+   0x01ff, 0xbf06ff6e,
+   0x0104, 0xbf850015,
+   0x866eff7b, 0x71ff,
+   0xbf840008, 0x866fff7b,
+   0x7080, 0xbf840001,
+   0xbeee1a87, 0xb8eff801,
+   0x8e6e8c6e, 0x866e6f6e,
+   0xbf85000a, 0x866eff6d,
+   0x00ff, 0xbf850007,
+   0xb8eef801, 0x866eff6e,
+   0x0800, 0xbf850003,
+   0x866eff7b, 0x0400,
+   0xbf850040, 0xb8faf807,
0x867aff7a, 0x001f8000,
-   0xb97af807, 0x86fe7e7e,
-   0x86ea6a6a, 0x8f6e8378,
-   0xb96ee0c2, 0xbf82,
-   0xb9780002, 0xbe801f6c,
+   0x8e7a8b7a, 0x8977ff77,
+   0xfc00, 0x8a77,
+   0xba7ff807, 0x,
+   0xb8faf812, 0xb8fbf813,
+   0x8efa887a, 0xbf0d8f7b,
+   0xbf840002, 0x877bff7b,
+   0x, 0xc0031c3d,
+   0x0010, 0xc0071bbd,
+   0x, 0xc0071ebd,
+   0x0008, 0xbf8cc07f,
+   0x8671ff6d, 0x0100,
+   0xbf840004, 0x92f1ff70,
+   0x00010001, 0xbf840016,
+   0xbf820005, 0x86708170,
+   0x8e709770, 0x8977ff77,
+   0x0080, 0x8077,
+   0x86ee6e6e, 0xbf840001,
+   0xbe801d6e, 0x866eff6d,
+   0x01ff, 0xbf850005,
+   0x8778ff78, 0x2000,
+   0x80ec886c, 0x82ed806d,
+   0xbf820005, 0x866eff6d,
+   0x0100, 0xbf850002,
+   0x806c846c, 0x826d806d,
0x866dff6d, 0x,
-   0xbefa0080, 0xb97a0283,
-   0xb8faf807, 0x867aff7a,
-   0x001f8000, 0x8e7a8b7a,
-   0x8977ff77, 0xfc00,
-   0x8a77, 0xba7ff807,
-   0x, 0xbeee007e,
-   0xbeef007f, 0xbefe0180,
-   0xbf94, 0x877a8478,
-   0xb97af802, 0xbf8e0002,
-   0xbf88fffe, 0xb8fa2a05,
-   0x807a817a, 0x8e7a8

[PATCH v4 21/24] drm/amdkfd: add pc sampling thread to trigger trap

2024-02-06 Thread James Zhu
Add a kthread to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 91 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  1 +
 2 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 6f50ba1f8989..ea9478c3738a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -39,6 +39,84 @@ struct supported_pc_sample_info supported_formats[] = {
{ IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
 };
 
+static int kfd_pc_sample_thread(void *param)
+{
+   struct amdgpu_device *adev;
+   struct kfd_node *node = param;
+   uint32_t timeout = 0;
+   ktime_t next_trap_time;
+
+   mutex_lock(>pcs_data.mutex);
+   if (node->pcs_data.hosttrap_entry.base.active_count &&
+   node->pcs_data.hosttrap_entry.base.pc_sample_info.interval &&
+   node->kfd2kgd->trigger_pc_sample_trap) {
+   switch (node->pcs_data.hosttrap_entry.base.pc_sample_info.type) 
{
+   case KFD_IOCTL_PCS_TYPE_TIME_US:
+   timeout = 
(uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.interval;
+   break;
+   default:
+   pr_debug("PC Sampling type %d not supported.",
+   
node->pcs_data.hosttrap_entry.base.pc_sample_info.type);
+   }
+   }
+   mutex_unlock(>pcs_data.mutex);
+   if (!timeout)
+   return -EINVAL;
+
+   adev = node->adev;
+
+   allow_signal(SIGKILL);
+   while (!kthread_should_stop() &&
+   
!READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable) &&
+   
!signal_pending(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) {
+   next_trap_time = ktime_add_us(ktime_get_raw(), timeout);
+
+   node->kfd2kgd->trigger_pc_sample_trap(adev, 
node->vm_info.last_vmid_kfd,
+   >pcs_data.hosttrap_entry.base.target_simd,
+   
>pcs_data.hosttrap_entry.base.target_wave_slot,
+   
node->pcs_data.hosttrap_entry.base.pc_sample_info.method);
+   pr_debug_ratelimited("triggered a host trap.");
+
+   might_sleep();
+   do {
+   ktime_t wait_time;
+   s64 wait_ns, wait_us;
+
+   wait_time = ktime_sub(next_trap_time, ktime_get_raw());
+   wait_ns = ktime_to_ns(wait_time);
+   wait_us = ktime_to_us(wait_time);
+   if (wait_ns >= 1)
+   usleep_range(wait_us - 10, wait_us);
+   else if (wait_ns > 0)
+   schedule();
+   else
+   break;
+   } while (1);
+   }
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL;
+
+   return 0;
+}
+
+static int kfd_pc_sample_thread_start(struct kfd_node *node)
+{
+   char thread_name[16];
+   int ret = 0;
+
+   snprintf(thread_name, 16, "pcs_%08x", node->adev->ddev.render->index);
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread =
+   kthread_run(kfd_pc_sample_thread, node, thread_name);
+
+   if (IS_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) {
+   ret = 
PTR_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread);
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL;
+   pr_debug("Failed to create pc sample thread for %s with ret = 
%d.",
+   thread_name, ret);
+   }
+
+   return ret;
+}
+
 static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
@@ -99,6 +177,7 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd,
struct pc_sampling_entry *pcs_entry)
 {
bool pc_sampling_start = false;
+   int ret = 0;
 
pcs_entry->enabled = true;
mutex_lock(>dev->pcs_data.mutex);
@@ -112,13 +191,16 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd,
mutex_unlock(>dev->pcs_data.mutex);
 
while (pc_sampling_start) {
-   if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable))
+   if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) {
usleep_range(1000, 2000);
-   else
+   } else {
+

[PATCH v4 22/24] drm/amdkfd: add pc sampling release when process release

2024-02-06 Thread James Zhu
Add pc sampling release when process release, it will force to
stop all activate sessions with this process.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 25 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  3 +++
 3 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index ea9478c3738a..783844ddd82f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -337,6 +337,31 @@ static int kfd_pc_sample_destroy(struct kfd_process_device 
*pdd, uint32_t trace_
return 0;
 }
 
+void kfd_pc_sample_release(struct kfd_process_device *pdd)
+{
+   struct pc_sampling_entry *pcs_entry;
+   struct idr *idp;
+   uint32_t id;
+
+   /* force to release all PC sampling task for this process */
+   idp = >dev->pcs_data.hosttrap_entry.base.pc_sampling_idr;
+   do {
+   pcs_entry = NULL;
+   mutex_lock(>dev->pcs_data.mutex);
+   idr_for_each_entry(idp, pcs_entry, id) {
+   if (pcs_entry->pdd != pdd)
+   continue;
+   break;
+   }
+   mutex_unlock(>dev->pcs_data.mutex);
+   if (pcs_entry) {
+   if (pcs_entry->enabled)
+   kfd_pc_sample_stop(pdd, pcs_entry);
+   kfd_pc_sample_destroy(pdd, id, pcs_entry);
+   }
+   } while (pcs_entry);
+}
+
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
index 4eeded4ea5b6..6175563ca9be 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
@@ -30,5 +30,6 @@
 
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args);
+void kfd_pc_sample_release(struct kfd_process_device *pdd);
 
 #endif /* KFD_PC_SAMPLING_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 4a450abf9fa9..bbad0b0848df 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -43,6 +43,7 @@ struct mm_struct;
 #include "kfd_svm.h"
 #include "kfd_smi_events.h"
 #include "kfd_debug.h"
+#include "kfd_pc_sampling.h"
 
 /*
  * List of struct kfd_process (field kfd_process).
@@ -1021,6 +1022,8 @@ static void kfd_process_destroy_pdds(struct kfd_process 
*p)
pr_debug("Releasing pdd (topology id %d) for process (pasid 
0x%x)\n",
pdd->dev->id, p->pasid);
 
+   kfd_pc_sample_release(pdd);
+
kfd_process_device_destroy_cwsr_dgpu(pdd);
kfd_process_device_destroy_ib_mem(pdd);
 
-- 
2.25.1



[PATCH v4 23/24] drm/amdkfd: Set debug trap bit when enabling PC Sampling

2024-02-06 Thread James Zhu
From: David Yat Sin 

We need the SPI_GDBG_PER_VMID_CNTL.TRAP_EN bit to be set during PC
Sampling so that the TTMP registers are valid inside the sampling data.
runtime_info.ttmp_setup will be cleared when the user application
does the AMDKFD_IOC_RUNTIME_ENABLE ioctl without
KFD_RUNTIME_ENABLE_MODE_ENABLE_MASK flag on exit.

It is also not valid to have the debugger attached to a process while PC
sampling is enabled so adding some checks to prevent this.

Signed-off-by: David Yat Sin 
Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 30 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 26 +
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 13 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  3 ++
 5 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d9cac97c54c0..bc37f3ee2c66 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2804,26 +2804,9 @@ static int runtime_enable(struct kfd_process *p, 
uint64_t r_debug,
 
p->runtime_info.runtime_state = DEBUG_RUNTIME_STATE_ENABLED;
p->runtime_info.r_debug = r_debug;
-   p->runtime_info.ttmp_setup = enable_ttmp_setup;
 
-   if (p->runtime_info.ttmp_setup) {
-   for (i = 0; i < p->n_pdds; i++) {
-   struct kfd_process_device *pdd = p->pdds[i];
-
-   if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) {
-   amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
-   pdd->dev->kfd2kgd->enable_debug_trap(
-   pdd->dev->adev,
-   true,
-   
pdd->dev->vm_info.last_vmid_kfd);
-   } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) {
-   pdd->spi_dbg_override = 
pdd->dev->kfd2kgd->enable_debug_trap(
-   pdd->dev->adev,
-   false,
-   0);
-   }
-   }
-   }
+   if (enable_ttmp_setup)
+   kfd_dbg_enable_ttmp_setup(p);
 
 retry:
if (p->debug_trap_enabled) {
@@ -2972,10 +2955,10 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
goto out;
}
 
-   /* Check if target is still PTRACED. */
rcu_read_lock();
+   /* Check if target is still PTRACED. */
if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE
-   && ptrace_parent(target->lead_thread) != 
current) {
+   && ptrace_parent(target->lead_thread) != current) {
pr_err("PID %i is not PTRACED and cannot be debugged\n", 
args->pid);
r = -EPERM;
}
@@ -2985,6 +2968,11 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
goto out;
 
mutex_lock(>mutex);
+   if (!!target->pc_sampling_ref) {
+   pr_debug("Cannot enable debug trap on PID:%d because PC 
Sampling active\n", args->pid);
+   r = -EBUSY;
+   goto unlock_out;
+   }
 
if (args->op != KFD_IOC_DBG_TRAP_ENABLE && !target->debug_trap_enabled) 
{
pr_err("PID %i not debug enabled for op %i\n", args->pid, 
args->op);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index d889e3545120..8d836c65c636 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -1120,3 +1120,29 @@ void kfd_dbg_set_enabled_debug_exception_mask(struct 
kfd_process *target,
 
mutex_unlock(>event_mutex);
 }
+
+void kfd_dbg_enable_ttmp_setup(struct kfd_process *p)
+{
+   int i;
+
+   if (p->runtime_info.ttmp_setup)
+   return;
+
+   p->runtime_info.ttmp_setup = true;
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+
+   if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) {
+   amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
+   pdd->dev->kfd2kgd->enable_debug_trap(
+   pdd->dev->adev,
+   true,
+   pdd->dev->vm_info.last_vmid_kfd);
+   } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) {
+ 

[PATCH v4 24/24] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

2024-02-06 Thread James Zhu
Bump the minor version to declare pc sampling feature is now
available.

Signed-off-by: James Zhu 
---
 include/uapi/linux/kfd_ioctl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index ec1b6404b185..7c2c867b57e8 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -41,9 +41,10 @@
  * - 1.13 - Add debugger API
  * - 1.14 - Update kfd_event_data
  * - 1.15 - Enable managing mappings in compute VMs with GEM_VA ioctl
+ * - 1.16 - Add PC Sampling ioctl
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 15
+#define KFD_IOCTL_MINOR_VERSION 16
 
 struct kfd_ioctl_get_version_args {
__u32 major_version;/* from KFD */
-- 
2.25.1



[PATCH v4 15/24] drm/amdkfd: trigger pc sampling trap for aldebaran

2024-02-06 Thread James Zhu
Implement trigger pc sampling trap for aldebaran.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index aff08321e976..27eda75ceecb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,6 +163,16 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
return watch_address_cntl;
 }
 
+static uint32_t kgd_aldebaran_trigger_pc_sample_trap(struct amdgpu_device 
*adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 8, 4,
+   target_simd, target_wave_slot, method);
+}
+
 const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -191,4 +201,5 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
+   .trigger_pc_sample_trap = kgd_aldebaran_trigger_pc_sample_trap,
 };
-- 
2.25.1



[PATCH v4 18/24] drm/amdkfd: enable pc sampling stop

2024-02-06 Thread James Zhu
Enable pc sampling stop.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 29 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  4 +++
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index b46caa52fbe8..53e44e68408e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -99,10 +99,33 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd)
return -EINVAL;
 }
 
-static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
+static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   bool pc_sampling_stop = false;
+
+   pcs_entry->enabled = false;
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.active_count--;
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) {
+   WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, 
true);
+   pc_sampling_stop = true;
+   }
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   kfd_process_set_trap_pc_sampling_flag(>qpd,
+   pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, 
false);
 
+   if (pc_sampling_stop) {
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;
+   pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0;
+   WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, 
false);
+   mutex_unlock(>dev->pcs_data.mutex);
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_create(struct kfd_process_device *pdd,
@@ -250,7 +273,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (!pcs_entry->enabled)
return -EALREADY;
else
-   return kfd_pc_sample_stop(pdd);
+   return kfd_pc_sample_stop(pdd, pcs_entry);
}
 
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 5a7805147da0..7bdcbe6be4fe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -271,6 +271,10 @@ struct kfd_dev;
 
 struct kfd_dev_pc_sampling_data {
uint32_t use_count; /* Num of PC sampling sessions */
+   uint32_t active_count;  /* Num of active sessions */
+   uint32_t target_simd;   /* target simd for trap */
+   uint32_t target_wave_slot;  /* target wave slot for trap */
+   bool stop_enable;   /* pc sampling stop in process */
struct idr pc_sampling_idr;
struct kfd_pc_sample_info pc_sample_info;
 };
-- 
2.25.1



[PATCH v4 19/24] drm/amdkfd: add queue remapping

2024-02-06 Thread James Zhu
Add queue remapping to ensure that any waves executing the PC sampling
part of the trap handler are done before kfd_pc_sample_stop returns,
and that no new waves enter that part of the trap handler afterwards.
This avoids race conditions that could lead to use-after-free. Unmapping
and remapping the queues either waits for the waves to drain, or preempts
them with CWSR, which itself executes a trap and waits for previous traps
to finish.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  4 +++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c0e71543389a..a3f57be63f4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager 
*dqm)
return debug_map_and_unlock(dqm);
 }
 
+void remap_queue(struct device_queue_manager *dqm,
+   enum kfd_unmap_queues_filter filter,
+   uint32_t filter_param,
+   uint32_t grace_period)
+{
+   dqm_lock(dqm);
+   if (!dqm->dev->kfd->shared_resources.enable_mes)
+   execute_queues_cpsch(dqm, filter, filter_param, grace_period);
+   dqm_unlock(dqm);
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index cf7e182588f8..f8aae3747a36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm);
 int debug_map_and_unlock(struct device_queue_manager *dqm);
 int debug_refresh_runlist(struct device_queue_manager *dqm);
 
+void remap_queue(struct device_queue_manager *dqm,
+   enum kfd_unmap_queues_filter filter,
+   uint32_t filter_param,
+   uint32_t grace_period);
+
 static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
 {
return (pdd->lds_base >> 16) & 0xFF;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 53e44e68408e..df2f4bfd0cda 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -24,6 +24,7 @@
 #include "kfd_priv.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_pc_sampling.h"
+#include "kfd_device_queue_manager.h"
 
 struct supported_pc_sample_info {
uint32_t ip_version;
@@ -115,9 +116,10 @@ static int kfd_pc_sample_stop(struct kfd_process_device 
*pdd,
 
kfd_process_set_trap_pc_sampling_flag(>qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, 
false);
+   remap_queue(pdd->dev->dqm,
+   KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, 
USE_DEFAULT_GRACE_PERIOD);
 
if (pc_sampling_stop) {
-
mutex_lock(>dev->pcs_data.mutex);
pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;
pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0;
-- 
2.25.1



[PATCH v4 09/24] drm/amdkfd: add interface to trigger pc sampling trap

2024-02-06 Thread James Zhu
Add interface to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 6d094cf3587d..12f9021d563e 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -31,6 +31,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include "amdgpu_irq.h"
 #include "amdgpu_gfx.h"
 
@@ -318,6 +320,11 @@ struct kfd2kgd_calls {
void (*program_trap_handler_settings)(struct amdgpu_device *adev,
uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr,
uint32_t inst);
+   uint32_t (*trigger_pc_sample_trap)(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method method);
 };
 
 #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
-- 
2.25.1



[PATCH v4 17/24] drm/amdkfd: add setting trap pc sampling flag

2024-02-06 Thread James Zhu
Add setting trap pc sampling flag.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 2df240518d1f..5a7805147da0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1198,6 +1198,8 @@ void kfd_process_set_trap_handler(struct 
qcm_process_device *qpd,
  uint64_t tma_addr);
 void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
 bool enabled);
+void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd,
+enum kfd_ioctl_pc_sample_method method, 
bool enabled);
 
 /* CWSR initialization */
 int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file *filep);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 3e3cead6ccf8..4a450abf9fa9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1463,6 +1463,19 @@ void kfd_process_set_trap_debug_flag(struct 
qcm_process_device *qpd,
}
 }
 
+void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd,
+enum kfd_ioctl_pc_sample_method method, 
bool enabled)
+{
+   if (qpd->cwsr_kaddr) {
+   volatile unsigned long *tma =
+   (volatile unsigned long *)(qpd->cwsr_kaddr + 
KFD_CWSR_TMA_OFFSET);
+   if (enabled)
+   set_bit(method, [2]);
+   else
+   clear_bit(method, [2]);
+   }
+}
+
 /*
  * On return the kfd_process is fully operational and will be freed when the
  * mm is released
-- 
2.25.1



[PATCH v4 05/24] drm/amdkfd: enable pc sampling create

2024-02-06 Thread James Zhu
From: David Yat Sin 

Enable pc sampling create.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 59 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index e9277c9beec7..9267de0bbdac 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -108,7 +108,64 @@ static int kfd_pc_sample_stop(struct kfd_process_device 
*pdd)
 static int kfd_pc_sample_create(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
-   return -EINVAL;
+   struct kfd_pc_sample_info *supported_format = NULL;
+   struct kfd_pc_sample_info user_info;
+   int ret;
+   int i;
+
+   if (user_args->num_sample_info != 1)
+   return -EINVAL;
+
+   ret = copy_from_user(_info, (void __user *) 
user_args->sample_info_ptr,
+   sizeof(struct kfd_pc_sample_info));
+   if (ret) {
+   pr_debug("Failed to copy PC sampling info from user\n");
+   return -EFAULT;
+   }
+
+   if (user_info.flags & KFD_IOCTL_PCS_FLAG_POWER_OF_2 &&
+   user_info.interval & (user_info.interval - 1)) {
+   pr_debug("Sampling interval's power is unmatched!");
+   return -EINVAL;
+   }
+
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+   if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version
+   && user_info.method == 
supported_formats[i].sample_info->method
+   && user_info.type == 
supported_formats[i].sample_info->type
+   && user_info.interval <= 
supported_formats[i].sample_info->interval_max
+   && user_info.interval >= 
supported_formats[i].sample_info->interval_min) {
+   supported_format =
+   (struct kfd_pc_sample_info 
*)supported_formats[i].sample_info;
+   break;
+   }
+   }
+
+   if (!supported_format) {
+   pr_debug("Sampling format is not supported!");
+   return -EOPNOTSUPP;
+   }
+
+   mutex_lock(>dev->pcs_data.mutex);
+   if (pdd->dev->pcs_data.hosttrap_entry.base.use_count &&
+   memcmp(>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   _info, sizeof(user_info))) {
+   ret = copy_to_user((void __user *) user_args->sample_info_ptr,
+   >dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(>dev->pcs_data.mutex);
+   return ret ? -EFAULT : -EEXIST;
+   }
+
+   /* TODO: add trace_id return */
+
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+   pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = 
user_info;
+
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   return 0;
 }
 
 static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index f55195fea3df..96999f602224 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -269,9 +269,19 @@ struct kfd_vmid_info {
 
 struct kfd_dev;
 
+struct kfd_dev_pc_sampling_data {
+   uint32_t use_count; /* Num of PC sampling sessions */
+   struct kfd_pc_sample_info pc_sample_info;
+};
+
+struct kfd_dev_pcs_hosttrap {
+   struct kfd_dev_pc_sampling_data base;
+};
+
 /* Per device PC Sampling data */
 struct kfd_dev_pc_sampling {
struct mutex mutex;
+   struct kfd_dev_pcs_hosttrap hosttrap_entry;
 };
 
 struct kfd_node {
-- 
2.25.1



[PATCH v4 11/24] drm/amdkfd/gfx9: enable host trap

2024-02-06 Thread James Zhu
Enable host trap.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 ---
 2 files changed, 52 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index d1caaf0e6a7c..af1f678790e7 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,14 +274,14 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf820258,
+   0xbf820001, 0xbf82025e,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -294,7 +294,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -303,13 +303,16 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   0xc0031c3d, 0x0010,
+   0xc0071bbd, 0x,
0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x86ee6e6e,
+   0xbf8cc07f, 0x8671ff6d,
+   0x0100, 0xbf840004,
+   0x92f1ff70, 0x00010001,
+   0xbf840016, 0xbf820005,
+   0x86708170, 0x8e709770,
+   0x8977ff77, 0x0080,
+   0x8077, 0x86ee6e6e,
0xbf840001, 0xbe801d6e,
0x866eff6d, 0x01ff,
0xbf850005, 0x8778ff78,
@@ -1098,14 +1101,14 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
 };
 
 static const uint32_t cwsr_trap_arcturus_hex[] = {
-   0xbf820001, 0xbf8202d4,
+   0xbf820001, 0xbf8202da,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1118,7 +1121,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -1127,13 +1130,16 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   0xc0031c3d, 0x0010,
+   0xc0071bbd, 0x,
0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x86ee6e6e,
+   0xbf8cc07f, 0x8671ff6d,
+   0x0100, 0xbf840004,
+   0x92f1ff70, 0x00010001,
+   0xbf840016, 0xbf820005,
+   0x86708170, 0x8e709770,
+   0x8977ff77, 0x0080,
+   0x8077, 0x86ee6e6e,
0xbf840001, 0xbe801d6e,
0x866eff6d, 0x01ff,
0xbf850005, 0x8778ff78,
@@ -1578,14 +1584,14 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
 };
 
 static const uint32_t cwsr_trap_aldebaran_hex[] = {
-   0xbf820001, 0xbf8202df,
+   0xbf820001, 0xbf8202e5,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1598,7 +1604,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -1607,13 +1613,16 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f

[PATCH v4 06/24] drm/amdkfd: add trace_id return

2024-02-06 Thread James Zhu
Add trace_id return for new pc sampling creation per device,
Use IDR to quickly locate pc_sampling_entry for reference.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  6 ++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0e24e011f66b..bcaeedac8fe0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev)
 static void kfd_pc_sampling_init(struct kfd_node *dev)
 {
mutex_init(>pcs_data.mutex);
+   idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
 }
 
 static void kfd_pc_sampling_exit(struct kfd_node *dev)
 {
+   idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr);
mutex_destroy(>pcs_data.mutex);
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 9267de0bbdac..a607fc148958 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -110,6 +110,7 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
 {
struct kfd_pc_sample_info *supported_format = NULL;
struct kfd_pc_sample_info user_info;
+   struct pc_sampling_entry *pcs_entry;
int ret;
int i;
 
@@ -157,7 +158,19 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
return ret ? -EFAULT : -EEXIST;
}
 
-   /* TODO: add trace_id return */
+   pcs_entry = kzalloc(sizeof(*pcs_entry), GFP_KERNEL);
+   if (!pcs_entry) {
+   mutex_unlock(>dev->pcs_data.mutex);
+   return -ENOMEM;
+   }
+
+   i = 
idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+   pcs_entry, 1, 0, GFP_KERNEL);
+   if (i < 0) {
+   mutex_unlock(>dev->pcs_data.mutex);
+   kfree(pcs_entry);
+   return i;
+   }
 
if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = 
user_info;
@@ -165,6 +178,11 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
mutex_unlock(>dev->pcs_data.mutex);
 
+   pcs_entry->pdd = pdd;
+   user_args->trace_id = (uint32_t)i;
+
+   pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", 
pcs_entry, i, pdd->dev->id);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 96999f602224..2df240518d1f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -271,6 +271,7 @@ struct kfd_dev;
 
 struct kfd_dev_pc_sampling_data {
uint32_t use_count; /* Num of PC sampling sessions */
+   struct idr pc_sampling_idr;
struct kfd_pc_sample_info pc_sample_info;
 };
 
@@ -756,6 +757,11 @@ enum kfd_pdd_bound {
  */
 #define SDMA_ACTIVITY_DIVISOR  100
 
+struct pc_sampling_entry {
+   bool enabled;
+   struct kfd_process_device *pdd;
+};
+
 /* Data that is per-process-per device. */
 struct kfd_process_device {
/* The device that owns this data. */
-- 
2.25.1



[PATCH v4 16/24] drm/amdkfd: use bit operation set debug trap

2024-02-06 Thread James Zhu
1st level TMA's 2nd byte which used for trap type setting,
to use bit operation to change selected bit only.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 717a60d7a4ea..3e3cead6ccf8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1443,13 +1443,23 @@ bool kfd_process_xnack_mode(struct kfd_process *p, bool 
supported)
return true;
 }
 
+/* bit offset in 1st-level TMA's 2nd byte which used for KFD_TRAP_TYPE_BIT */
+enum KFD_TRAP_TYPE_BIT {
+   KFD_TRAP_TYPE_DEBUG = 0,/* bit 0 for debug trap */
+   KFD_TRAP_TYPE_HOST,
+   KFD_TRAP_TYPE_STOCHASTIC,
+};
+
 void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
 bool enabled)
 {
if (qpd->cwsr_kaddr) {
-   uint64_t *tma =
-   (uint64_t *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET);
-   tma[2] = enabled;
+   volatile unsigned long *tma =
+   (volatile unsigned long *)(qpd->cwsr_kaddr + 
KFD_CWSR_TMA_OFFSET);
+   if (enabled)
+   set_bit(KFD_TRAP_TYPE_DEBUG, [2]);
+   else
+   clear_bit(KFD_TRAP_TYPE_DEBUG, [2]);
}
 }
 
-- 
2.25.1



[PATCH v4 20/24] drm/amdkfd: enable pc sampling start

2024-02-06 Thread James Zhu
Enable pc sampling start.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 27 +---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index df2f4bfd0cda..6f50ba1f8989 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -95,9 +95,30 @@ static int kfd_pc_sample_query_cap(struct kfd_process_device 
*pdd,
return 0;
 }
 
-static int kfd_pc_sample_start(struct kfd_process_device *pdd)
+static int kfd_pc_sample_start(struct kfd_process_device *pdd,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   bool pc_sampling_start = false;
+
+   pcs_entry->enabled = true;
+   mutex_lock(>dev->pcs_data.mutex);
+
+   kfd_process_set_trap_pc_sampling_flag(>qpd,
+   pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, 
true);
+
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count)
+   pc_sampling_start = true;
+   pdd->dev->pcs_data.hosttrap_entry.base.active_count++;
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   while (pc_sampling_start) {
+   if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable))
+   usleep_range(1000, 2000);
+   else
+   break;
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
@@ -269,7 +290,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (pcs_entry->enabled)
return -EALREADY;
else
-   return kfd_pc_sample_start(pdd);
+   return kfd_pc_sample_start(pdd, pcs_entry);
 
case KFD_IOCTL_PCS_OP_STOP:
if (!pcs_entry->enabled)
-- 
2.25.1



[PATCH v4 14/24] drm/amdkfd: trigger pc sampling trap for arcturus

2024-02-06 Thread James Zhu
Implement trigger pc sampling trap for arcturus.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c| 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 0ba15dcbe4e1..10b362e072a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -390,6 +390,17 @@ static uint32_t kgd_arcturus_disable_debug_trap(struct 
amdgpu_device *adev,
 
return 0;
 }
+
+static uint32_t kgd_arcturus_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 10, 4,
+   target_simd, target_wave_slot, method);
+}
+
 const struct kfd2kgd_calls arcturus_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -418,5 +429,6 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = {
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
-   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings
+   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
+   .trigger_pc_sample_trap = kgd_arcturus_trigger_pc_sample_trap
 };
-- 
2.25.1



[PATCH v4 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support

2024-02-06 Thread James Zhu
From: David Yat Sin 

Add pc sampling support in kfd_ioctl.

The user mode code which uses this new kfd_ioctl is linked to
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
with master branch.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 include/uapi/linux/kfd_ioctl.h | 61 +-
 1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 9ce46edc62a5..ec1b6404b185 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1447,6 +1447,62 @@ struct kfd_ioctl_dbg_trap_args {
};
 };
 
+/**
+ * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations
+ *
+ * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities
+ * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a 
per-device PC sampler instance
+ * @KFD_IOCTL_PCS_OP_DESTROY:Unregister from a previously 
registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_START:  Process begins taking samples from a 
previously registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_STOP:   Process stops taking samples from a 
previously registered PC sampler instance
+ */
+enum kfd_ioctl_pc_sample_op {
+   KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES,
+   KFD_IOCTL_PCS_OP_CREATE,
+   KFD_IOCTL_PCS_OP_DESTROY,
+   KFD_IOCTL_PCS_OP_START,
+   KFD_IOCTL_PCS_OP_STOP,
+};
+
+/* Values have to be a power of 2*/
+#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001
+
+enum kfd_ioctl_pc_sample_method {
+   KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1,
+   KFD_IOCTL_PCS_METHOD_STOCHASTIC,
+};
+
+enum kfd_ioctl_pc_sample_type {
+   KFD_IOCTL_PCS_TYPE_TIME_US,
+   KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES,
+   KFD_IOCTL_PCS_TYPE_INSTRUCTIONS
+};
+
+struct kfd_pc_sample_info {
+   __u64 interval;  /* [IN] if PCS_TYPE_INTERVAL_US: sample interval 
in us
+ * if PCS_TYPE_CLOCK_CYCLES: sample interval in 
graphics core clk cycles
+ * if PCS_TYPE_INSTRUCTIONS: sample interval in 
instructions issued by
+ * graphics compute units
+ */
+   __u64 interval_min;  /* [OUT] */
+   __u64 interval_max;  /* [OUT] */
+   __u64 flags; /* [OUT] indicate potential restrictions e.g 
FLAG_POWER_OF_2 */
+   __u32 method;/* [IN/OUT] kfd_ioctl_pc_sample_method */
+   __u32 type;  /* [IN/OUT] kfd_ioctl_pc_sample_type */
+};
+
+#define KFD_IOCTL_PCS_QUERY_TYPE_FULL (1 << 0) /* If not set, return current */
+
+struct kfd_ioctl_pc_sample_args {
+   __u64 sample_info_ptr;   /* array of kfd_pc_sample_info */
+   __u32 num_sample_info;
+   __u32 op;/* kfd_ioctl_pc_sample_op */
+   __u32 gpu_id;
+   __u32 trace_id;
+   __u32 flags; /* kfd_ioctl_pcs_query flags */
+   __u32 reserved;
+};
+
 #define AMDKFD_IOCTL_BASE 'K'
 #define AMDKFD_IO(nr)  _IO(AMDKFD_IOCTL_BASE, nr)
 #define AMDKFD_IOR(nr, type)   _IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -1567,7 +1623,10 @@ struct kfd_ioctl_dbg_trap_args {
 #define AMDKFD_IOC_DBG_TRAP\
AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args)
 
+#define AMDKFD_IOC_PC_SAMPLE   \
+   AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args)
+
 #define AMDKFD_COMMAND_START   0x01
-#define AMDKFD_COMMAND_END 0x27
+#define AMDKFD_COMMAND_END 0x28
 
 #endif
-- 
2.25.1



[PATCH v4 08/24] drm/amdkfd: enable pc sampling destroy

2024-02-06 Thread James Zhu
Enable pc sampling destroy.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 72c66d4bd24f..b46caa52fbe8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -186,10 +186,24 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
return 0;
 }
 
-static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id)
+static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   pr_debug("free pcs_entry = %p, trace_id = 0x%x on gpu 0x%x",
+   pcs_entry, trace_id, pdd->dev->id);
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count--;
+   idr_remove(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, 
trace_id);
 
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+   memset(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, 
0x0,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   kfree(pcs_entry);
+
+   return 0;
 }
 
 int kfd_pc_sample(struct kfd_process_device *pdd,
@@ -224,7 +238,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (pcs_entry->enabled)
return -EBUSY;
else
-   return kfd_pc_sample_destroy(pdd, args->trace_id);
+   return kfd_pc_sample_destroy(pdd, args->trace_id, 
pcs_entry);
 
case KFD_IOCTL_PCS_OP_START:
if (pcs_entry->enabled)
-- 
2.25.1



[PATCH v4 10/24] drm/amdkfd: trigger pc sampling trap for gfx v9

2024-02-06 Thread James Zhu
Implement trigger pc sampling trap for gfx v9.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 36 +++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  7 
 2 files changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 5a35a8ca8922..7d8c0e13ac12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1144,6 +1144,42 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
kgd_gfx_v9_unlock_srbm(adev, inst);
 }
 
+uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t max_wave_slot,
+   uint32_t max_simd,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) {
+   uint32_t value = 0;
+
+   value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP);
+   value = REG_SET_FIELD(value, SQ_CMD, MODE, 
SQ_IND_CMD_MODE_SINGLE);
+
+   /* select *target_simd */
+   value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd);
+   /* select *target_wave_slot */
+   value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, 
(*target_wave_slot)++);
+
+   mutex_lock(>grbm_idx_mutex);
+   amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x, 0);
+   WREG32_SOC15(GC, 0, mmSQ_CMD, value);
+   mutex_unlock(>grbm_idx_mutex);
+
+   *target_wave_slot %= max_wave_slot;
+   if (!(*target_wave_slot)) {
+   (*target_simd)++;
+   *target_simd %= max_simd;
+   }
+   } else {
+   pr_debug("PC Sampling method %d not supported.", method);
+   return -EOPNOTSUPP;
+   }
+   return 0;
+}
+
 const struct kfd2kgd_calls gfx_v9_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
index ce424615f59b..b47b926891a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
@@ -101,3 +101,10 @@ void kgd_gfx_v9_build_grace_period_packet_info(struct 
amdgpu_device *adev,
   uint32_t grace_period,
   uint32_t *reg_offset,
   uint32_t *reg_data);
+uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t max_wave_slot,
+   uint32_t max_simd,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method);
-- 
2.25.1



[PATCH v4 13/24] drm/amdgpu: add sq host trap status check

2024-02-06 Thread James Zhu
Before fire a new host trap, check the host trap status.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 35 +++
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |  2 ++
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |  5 +++
 3 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index adfe5e5585e5..43edd62df5fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1144,6 +1144,35 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
kgd_gfx_v9_unlock_srbm(adev, inst);
 }
 
+static uint32_t kgd_aldebaran_get_hosttrap_status(struct amdgpu_device *adev)
+{
+   uint32_t sq_hosttrap_status = 0x0;
+   int i, j;
+
+   mutex_lock(>grbm_idx_mutex);
+   for (i = 0; i < adev->gfx.config.max_shader_engines; i++) {
+   for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) {
+   amdgpu_gfx_select_se_sh(adev, i, j, 0x, 0);
+   sq_hosttrap_status = RREG32_SOC15(GC, 0, 
mmSQ_HOSTTRAP_STATUS);
+
+   if (sq_hosttrap_status & 
SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK) {
+   WREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS,
+   
SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK);
+   sq_hosttrap_status = 0x0;
+   continue;
+   }
+   if (sq_hosttrap_status)
+   goto out;
+   }
+   }
+
+out:
+   amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0);
+   mutex_unlock(>grbm_idx_mutex);
+
+   return sq_hosttrap_status;
+}
+
 uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
uint32_t vmid,
uint32_t max_wave_slot,
@@ -1154,6 +1183,12 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct 
amdgpu_device *adev,
 {
if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) {
uint32_t value = 0;
+   uint32_t sq_hosttrap_status = 0x0;
+
+   sq_hosttrap_status = kgd_aldebaran_get_hosttrap_status(adev);
+   /* skip when last host trap request is still pending to 
complete */
+   if (sq_hosttrap_status)
+   return 0;
 
value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP);
value = REG_SET_FIELD(value, SQ_CMD, MODE, 
SQ_IND_CMD_MODE_SINGLE);
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
index 12d451e5475b..5b17d9066452 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
@@ -462,6 +462,8 @@
 #define mmSQ_IND_DATA_BASE_IDX 
0
 #define mmSQ_CMD   
0x037b
 #define mmSQ_CMD_BASE_IDX  
0
+#define mmSQ_HOSTTRAP_STATUS   
0x0376
+#define mmSQ_HOSTTRAP_STATUS_BASE_IDX  
0
 #define mmSQ_TIME_HI   
0x037c
 #define mmSQ_TIME_HI_BASE_IDX  
0
 #define mmSQ_TIME_LO   
0x037d
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
index efc16ddf274a..3dfe4ab31421 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
@@ -2616,6 +2616,11 @@
 //SQ_CMD_TIMESTAMP
 #define SQ_CMD_TIMESTAMP__TIMESTAMP__SHIFT 
   0x0
 #define SQ_CMD_TIMESTAMP__TIMESTAMP_MASK   
   0x00FFL
+//SQ_HOSTTRAP_STATUS
+#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT__SHIFT  
   0x0
+#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE__SHIFT  
   0x8
+#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT_MASK
   0x00FFL
+#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK

[PATCH v4 07/24] drm/amdkfd: check pcs_entry valid

2024-02-06 Thread James Zhu
Check pcs_entry valid for pc sampling ioctl.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++--
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index a607fc148958..72c66d4bd24f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -195,6 +195,24 @@ static int kfd_pc_sample_destroy(struct kfd_process_device 
*pdd, uint32_t trace_
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args)
 {
+   struct pc_sampling_entry *pcs_entry;
+
+   if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES &&
+   args->op != KFD_IOCTL_PCS_OP_CREATE) {
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pcs_entry = 
idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+   args->trace_id);
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   /* pcs_entry is only for this pc sampling process,
+* which has kfd_process->mutex protected here.
+*/
+   if (!pcs_entry ||
+   pcs_entry->pdd != pdd)
+   return -EINVAL;
+   }
+
switch (args->op) {
case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES:
return kfd_pc_sample_query_cap(pdd, args);
@@ -203,13 +221,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
return kfd_pc_sample_create(pdd, args);
 
case KFD_IOCTL_PCS_OP_DESTROY:
-   return kfd_pc_sample_destroy(pdd, args->trace_id);
+   if (pcs_entry->enabled)
+   return -EBUSY;
+   else
+   return kfd_pc_sample_destroy(pdd, args->trace_id);
 
case KFD_IOCTL_PCS_OP_START:
-   return kfd_pc_sample_start(pdd);
+   if (pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_start(pdd);
 
case KFD_IOCTL_PCS_OP_STOP:
-   return kfd_pc_sample_stop(pdd);
+   if (!pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_stop(pdd);
}
 
return -EINVAL;
-- 
2.25.1



[PATCH v4 04/24] drm/amdkfd: add pc sampling mutex

2024-02-06 Thread James Zhu
Add pc sampling mutex per node, and do init/destroy in node init.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h   |  7 +++
 2 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0a9cf9dfc224..0e24e011f66b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -533,6 +533,16 @@ static void kfd_smi_init(struct kfd_node *dev)
spin_lock_init(>smi_lock);
 }
 
+static void kfd_pc_sampling_init(struct kfd_node *dev)
+{
+   mutex_init(>pcs_data.mutex);
+}
+
+static void kfd_pc_sampling_exit(struct kfd_node *dev)
+{
+   mutex_destroy(>pcs_data.mutex);
+}
+
 static int kfd_init_node(struct kfd_node *node)
 {
int err = -1;
@@ -563,6 +573,7 @@ static int kfd_init_node(struct kfd_node *node)
}
 
kfd_smi_init(node);
+   kfd_pc_sampling_init(node);
 
return 0;
 
@@ -593,6 +604,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned 
int num_nodes)
kfd_topology_remove_device(knode);
if (knode->gws)
amdgpu_amdkfd_free_gws(knode->adev, knode->gws);
+   kfd_pc_sampling_exit(knode);
kfree(knode);
kfd->nodes[i] = NULL;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index ae9a41670909..f55195fea3df 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -269,6 +269,11 @@ struct kfd_vmid_info {
 
 struct kfd_dev;
 
+/* Per device PC Sampling data */
+struct kfd_dev_pc_sampling {
+   struct mutex mutex;
+};
+
 struct kfd_node {
unsigned int node_id;
struct amdgpu_device *adev; /* Duplicated here along with keeping
@@ -322,6 +327,8 @@ struct kfd_node {
struct kfd_local_mem_info local_mem_info;
 
struct kfd_dev *kfd;
+
+   struct kfd_dev_pc_sampling pcs_data;
 };
 
 struct kfd_dev {
-- 
2.25.1



[PATCH v4 03/24] drm/amdkfd: enable pc sampling query

2024-02-06 Thread James Zhu
From: David Yat Sin 

Enable pc sampling to query system capability.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 65 +++-
 1 file changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index a7e78ff42d07..e9277c9beec7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -25,10 +25,73 @@
 #include "amdgpu_amdkfd.h"
 #include "kfd_pc_sampling.h"
 
+struct supported_pc_sample_info {
+   uint32_t ip_version;
+   const struct kfd_pc_sample_info *sample_info;
+};
+
+const struct kfd_pc_sample_info sample_info_hosttrap_9_0_0 = {
+   0, 1, ~0ULL, 0, KFD_IOCTL_PCS_METHOD_HOSTTRAP, 
KFD_IOCTL_PCS_TYPE_TIME_US };
+
+struct supported_pc_sample_info supported_formats[] = {
+   { IP_VERSION(9, 4, 1), _info_hosttrap_9_0_0 },
+   { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
+};
+
 static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
-   return -EINVAL;
+   uint64_t sample_offset;
+   int num_method = 0;
+   int ret;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++)
+   if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version)
+   num_method++;
+
+   if (!num_method) {
+   pr_debug("PC Sampling not supported on GC_HWIP:0x%x.",
+   pdd->dev->adev->ip_versions[GC_HWIP][0]);
+   return -EOPNOTSUPP;
+   }
+
+   ret = 0;
+   mutex_lock(>dev->pcs_data.mutex);
+   if (user_args->flags != KFD_IOCTL_PCS_QUERY_TYPE_FULL &&
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count) {
+   /* If we already have a session, restrict returned list to 
current method  */
+   user_args->num_sample_info = 1;
+
+   if (user_args->sample_info_ptr)
+   ret = copy_to_user((void __user *) 
user_args->sample_info_ptr,
+   
>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(>dev->pcs_data.mutex);
+   return ret ? -EFAULT : 0;
+   }
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   if (!user_args->sample_info_ptr || user_args->num_sample_info < 
num_method) {
+   user_args->num_sample_info = num_method;
+   pr_debug("ASIC requires space for %d kfd_pc_sample_info 
entries.", num_method);
+   return -ENOSPC;
+   }
+
+   sample_offset = user_args->sample_info_ptr;
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+   if (KFD_GC_VERSION(pdd->dev) == 
supported_formats[i].ip_version) {
+   ret = copy_to_user((void __user *) sample_offset,
+   supported_formats[i].sample_info, sizeof(struct 
kfd_pc_sample_info));
+   if (ret) {
+   pr_debug("Failed to copy PC sampling info to 
user.");
+   return -EFAULT;
+   }
+   sample_offset += sizeof(struct kfd_pc_sample_info);
+   }
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_start(struct kfd_process_device *pdd)
-- 
2.25.1



[PATCH v4 02/24] drm/amdkfd: add pc sampling support

2024-02-06 Thread James Zhu
From: David Yat Sin 

Add pc sampling functions in amdkfd.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/Makefile  |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 45 +++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 78 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 34 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 13 
 5 files changed, 172 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index a5ae7bcf44eb..790fd028a681 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -57,7 +57,8 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_int_process_v11.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
$(AMDKFD_PATH)/kfd_crat.o \
-   $(AMDKFD_PATH)/kfd_debug.o
+   $(AMDKFD_PATH)/kfd_debug.o \
+   $(AMDKFD_PATH)/kfd_pc_sampling.o
 
 ifneq ($(CONFIG_DEBUG_FS),)
 AMDKFD_FILES += $(AMDKFD_PATH)/kfd_debugfs.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 80e90fdef291..d9cac97c54c0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -41,6 +41,7 @@
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
 #include "kfd_svm.h"
+#include "kfd_pc_sampling.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
 #include "amdgpu_dma_buf.h"
@@ -1745,6 +1746,39 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+static int kfd_ioctl_pc_sample(struct file *filep,
+  struct kfd_process *p, void __user *data)
+{
+   struct kfd_ioctl_pc_sample_args *args = data;
+   struct kfd_process_device *pdd;
+   int ret = 0;
+
+   if (sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+   pr_err("PC Sampling does not support sched_policy %i", 
sched_policy);
+   return -EINVAL;
+   }
+
+   mutex_lock(>mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+
+   if (!pdd) {
+   pr_debug("could not find gpu id 0x%x.", args->gpu_id);
+   ret = -EINVAL;
+   } else if (args->op == KFD_IOCTL_PCS_OP_START) {
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
+   if (IS_ERR(pdd)) {
+   pr_debug("failed to bind process %p with gpu id 0x%x", 
p, args->gpu_id);
+   ret = -ESRCH;
+   }
+   }
+
+   if (!ret)
+   ret = kfd_pc_sample(pdd, args);
+   mutex_unlock(>mutex);
+
+   return ret;
+}
+
 static int criu_checkpoint_process(struct kfd_process *p,
 uint8_t __user *user_priv_data,
 uint64_t *priv_offset)
@@ -3219,6 +3253,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_DBG_TRAP,
kfd_ioctl_set_debug_trap, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_PC_SAMPLE,
+   kfd_ioctl_pc_sample, KFD_IOC_FLAG_PERFMON),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
@@ -3295,6 +3332,14 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
}
}
 
+   /* PC Sampling Monitor */
+   if (unlikely(ioctl->flags & KFD_IOC_FLAG_PERFMON)) {
+   if (!capable(CAP_PERFMON) && !capable(CAP_SYS_ADMIN)) {
+   retcode = -EACCES;
+   goto err_i1;
+   }
+   }
+
if (cmd & (IOC_IN | IOC_OUT)) {
if (asize <= sizeof(stack_kdata)) {
kdata = stack_kdata;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
new file mode 100644
index ..a7e78ff42d07
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above 

[PATCH v4 00/24] Support Host Trap Sampling for gfx941/gfx942

2024-02-06 Thread James Zhu
PC sampling is a form of software profiling, where the threads of an application
are periodically interrupted and the program counter that the threads are 
currently
attempting to execute is saved out for profiling.

David Yat Sin (5):
  drm/amdkfd/kfd_ioctl: add pc sampling support
  drm/amdkfd: add pc sampling support
  drm/amdkfd: enable pc sampling query
  drm/amdkfd: enable pc sampling create
  drm/amdkfd: Set debug trap bit when enabling PC Sampling

James Zhu (19):
  drm/amdkfd: add pc sampling mutex
  drm/amdkfd: add trace_id return
  drm/amdkfd: check pcs_entry valid
  drm/amdkfd: enable pc sampling destroy
  drm/amdkfd: add interface to trigger pc sampling trap
  drm/amdkfd: trigger pc sampling trap for gfx v9
  drm/amdkfd/gfx9: enable host trap
  drm/amdgpu: use trapID 4 for host trap
  drm/amdgpu: add sq host trap status check
  drm/amdkfd: trigger pc sampling trap for arcturus
  drm/amdkfd: trigger pc sampling trap for aldebaran
  drm/amdkfd: use bit operation set debug trap
  drm/amdkfd: add setting trap pc sampling flag
  drm/amdkfd: enable pc sampling stop
  drm/amdkfd: add queue remapping
  drm/amdkfd: enable pc sampling start
  drm/amdkfd: add pc sampling thread to trigger trap
  drm/amdkfd: add pc sampling release when process release
  drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   11 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   14 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   73 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 +
 drivers/gpu/drm/amd/amdkfd/Makefile   |3 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 +
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |   29 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   75 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c|   26 +
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|3 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   14 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   11 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 +
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  426 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h  |   35 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   46 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   32 +-
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |2 +
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |5 +
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |7 +
 include/uapi/linux/kfd_ioctl.h|   64 +-
 21 files changed, 1914 insertions(+), 1080 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

-- 
2.25.1



Re: [PATCH] drm/amdgpu: make a correction on comment

2024-01-08 Thread James Zhu



On 2024-01-08 03:12, Christian König wrote:

Am 02.01.24 um 21:56 schrieb James Zhu:

Current AMDGPU_VM_RESERVED_VRAM is updated to 8M.

Signed-off-by: James Zhu 


Maybe remove the value completely from the comment, just something 
like "How much memory be reserved for page tables".

[JZ] This will work better. Thanks!


Either way Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index b6cd565562ad..b788067b9158 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -116,7 +116,7 @@ struct amdgpu_mem_stats;
  #define AMDGPU_VM_FAULT_STOP_FIRST    1
  #define AMDGPU_VM_FAULT_STOP_ALWAYS    2
  -/* Reserve 4MB VRAM for page tables */
+/* Reserve 8MB VRAM for page tables */
  #define AMDGPU_VM_RESERVED_VRAM    (8ULL << 20)
    /*




[PATCH] drm/amdgpu: make a correction on comment

2024-01-02 Thread James Zhu
Current AMDGPU_VM_RESERVED_VRAM is updated to 8M.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index b6cd565562ad..b788067b9158 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -116,7 +116,7 @@ struct amdgpu_mem_stats;
 #define AMDGPU_VM_FAULT_STOP_FIRST 1
 #define AMDGPU_VM_FAULT_STOP_ALWAYS2
 
-/* Reserve 4MB VRAM for page tables */
+/* Reserve 8MB VRAM for page tables */
 #define AMDGPU_VM_RESERVED_VRAM(8ULL << 20)
 
 /*
-- 
2.25.1



Re: [PATCH v3 23/24] drm/amdkfd: set debug trap bit when enabling PC Sampling

2024-01-02 Thread James Zhu


On 2023-12-15 10:59, James Zhu wrote:

From: David Yat Sin

We need the SPI_GDBG_PER_VMID_CNTL.TRAP_EN bit to be set during PC
Sampling so that the TTMP registers are valid inside the sampling data.
runtime_info.ttmp_setup will be cleared when the user application
does the AMDKFD_IOC_RUNTIME_ENABLE ioctl without
KFD_RUNTIME_ENABLE_MODE_ENABLE_MASK flag on exit.

It is also not valid to have the debugger attached to a process while PC
sampling is enabled so adding some checks to prevent this.

Signed-off-by: David Yat Sin
---
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 31 --
  drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 22 ++
  drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  3 ++
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 43 +---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |  4 +-
  5 files changed, 75 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 1a3a8ded9c93..f7a8794c2bde 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1775,7 +1775,7 @@ static int kfd_ioctl_pc_sample(struct file *filep,
pr_debug("failed to bind process %p with gpu id 0x%x", p, 
args->gpu_id);
ret = -ESRCH;
} else {
-   ret = kfd_pc_sample(pdd, args);
+   ret = kfd_pc_sample(p, pdd, args);
}
}
mutex_unlock(>mutex);
@@ -2808,26 +2808,9 @@ static int runtime_enable(struct kfd_process *p, 
uint64_t r_debug,
  
  	p->runtime_info.runtime_state = DEBUG_RUNTIME_STATE_ENABLED;

p->runtime_info.r_debug = r_debug;
-   p->runtime_info.ttmp_setup = enable_ttmp_setup;
  
-	if (p->runtime_info.ttmp_setup) {

-   for (i = 0; i < p->n_pdds; i++) {
-   struct kfd_process_device *pdd = p->pdds[i];
-
-   if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) {
-   amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
-   pdd->dev->kfd2kgd->enable_debug_trap(
-   pdd->dev->adev,
-   true,
-   
pdd->dev->vm_info.last_vmid_kfd);
-   } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) {
-   pdd->spi_dbg_override = 
pdd->dev->kfd2kgd->enable_debug_trap(
-   pdd->dev->adev,
-   false,
-   0);
-   }
-   }
-   }
+   if (enable_ttmp_setup)
+   kfd_dbg_enable_ttmp_setup(p);
  
  retry:

if (p->debug_trap_enabled) {
@@ -2976,9 +2959,13 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
goto out;
}
  
-	/* Check if target is still PTRACED. */

rcu_read_lock();
-   if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE
+
+   if (kfd_pc_sampling_enabled(target)) {
+   pr_debug("Cannot enable debug trap on PID:%d because PC Sampling 
active\n", args->pid);
+   r = -EBUSY;
+   /* Check if target is still PTRACED. */
+   } else if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE
&& ptrace_parent(target->lead_thread) != 
current) {
pr_err("PID %i is not PTRACED and cannot be debugged\n", 
args->pid);
r = -EPERM;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 9ec750666382..092c2dc84d24 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -1118,3 +1118,25 @@ void kfd_dbg_set_enabled_debug_exception_mask(struct 
kfd_process *target,
  
  	mutex_unlock(>event_mutex);

  }
+
+void kfd_dbg_enable_ttmp_setup(struct kfd_process *p)
+{
+   int i;
+   p->runtime_info.ttmp_setup = true;
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+
+   if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) {
+   amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
+   pdd->dev->kfd2kgd->enable_debug_trap(
+   pdd->dev->adev,
+   true,
+   pdd->dev->vm_info.last_vmid_kfd);
+   } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) {
+   pdd->spi_dbg_override = 
pdd->dev->kfd2kgd->enable

[PATCH v3 23/24] drm/amdkfd: set debug trap bit when enabling PC Sampling

2023-12-15 Thread James Zhu
From: David Yat Sin 

We need the SPI_GDBG_PER_VMID_CNTL.TRAP_EN bit to be set during PC
Sampling so that the TTMP registers are valid inside the sampling data.
runtime_info.ttmp_setup will be cleared when the user application
does the AMDKFD_IOC_RUNTIME_ENABLE ioctl without
KFD_RUNTIME_ENABLE_MODE_ENABLE_MASK flag on exit.

It is also not valid to have the debugger attached to a process while PC
sampling is enabled so adding some checks to prevent this.

Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 31 --
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c   | 22 ++
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 43 +---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |  4 +-
 5 files changed, 75 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 1a3a8ded9c93..f7a8794c2bde 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1775,7 +1775,7 @@ static int kfd_ioctl_pc_sample(struct file *filep,
pr_debug("failed to bind process %p with gpu id 0x%x", 
p, args->gpu_id);
ret = -ESRCH;
} else {
-   ret = kfd_pc_sample(pdd, args);
+   ret = kfd_pc_sample(p, pdd, args);
}
}
mutex_unlock(>mutex);
@@ -2808,26 +2808,9 @@ static int runtime_enable(struct kfd_process *p, 
uint64_t r_debug,
 
p->runtime_info.runtime_state = DEBUG_RUNTIME_STATE_ENABLED;
p->runtime_info.r_debug = r_debug;
-   p->runtime_info.ttmp_setup = enable_ttmp_setup;
 
-   if (p->runtime_info.ttmp_setup) {
-   for (i = 0; i < p->n_pdds; i++) {
-   struct kfd_process_device *pdd = p->pdds[i];
-
-   if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) {
-   amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
-   pdd->dev->kfd2kgd->enable_debug_trap(
-   pdd->dev->adev,
-   true,
-   
pdd->dev->vm_info.last_vmid_kfd);
-   } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) {
-   pdd->spi_dbg_override = 
pdd->dev->kfd2kgd->enable_debug_trap(
-   pdd->dev->adev,
-   false,
-   0);
-   }
-   }
-   }
+   if (enable_ttmp_setup)
+   kfd_dbg_enable_ttmp_setup(p);
 
 retry:
if (p->debug_trap_enabled) {
@@ -2976,9 +2959,13 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, 
struct kfd_process *p, v
goto out;
}
 
-   /* Check if target is still PTRACED. */
rcu_read_lock();
-   if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE
+
+   if (kfd_pc_sampling_enabled(target)) {
+   pr_debug("Cannot enable debug trap on PID:%d because PC 
Sampling active\n", args->pid);
+   r = -EBUSY;
+   /* Check if target is still PTRACED. */
+   } else if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE
&& ptrace_parent(target->lead_thread) != 
current) {
pr_err("PID %i is not PTRACED and cannot be debugged\n", 
args->pid);
r = -EPERM;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 9ec750666382..092c2dc84d24 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -1118,3 +1118,25 @@ void kfd_dbg_set_enabled_debug_exception_mask(struct 
kfd_process *target,
 
mutex_unlock(>event_mutex);
 }
+
+void kfd_dbg_enable_ttmp_setup(struct kfd_process *p)
+{
+   int i;
+   p->runtime_info.ttmp_setup = true;
+   for (i = 0; i < p->n_pdds; i++) {
+   struct kfd_process_device *pdd = p->pdds[i];
+
+   if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) {
+   amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
+   pdd->dev->kfd2kgd->enable_debug_trap(
+   pdd->dev->adev,
+   true,
+   pdd->dev->vm_info.last_vmid_kfd);
+   } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) {
+   pdd->spi_dbg_override = 
pdd->dev->kfd2kgd->enable_debug_trap(
+   pdd->dev->adev,
+   false,
+   0);
+   }
+   }
+}
\ No newline at end of file
diff --git 

[PATCH v3 17/24] drm/amdkfd: add setting trap pc sampling flag

2023-12-15 Thread James Zhu
Add setting trap pc sampling flag.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 7ca7cc726246..b9a36891d099 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1198,6 +1198,8 @@ void kfd_process_set_trap_handler(struct 
qcm_process_device *qpd,
  uint64_t tma_addr);
 void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
 bool enabled);
+void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd,
+enum kfd_ioctl_pc_sample_method method, 
bool enabled);
 
 /* CWSR initialization */
 int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file *filep);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 1a31b556a5ff..6bc9dcfad484 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1460,6 +1460,19 @@ void kfd_process_set_trap_debug_flag(struct 
qcm_process_device *qpd,
}
 }
 
+void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd,
+enum kfd_ioctl_pc_sample_method method, 
bool enabled)
+{
+   if (qpd->cwsr_kaddr) {
+   volatile unsigned long *tma =
+   (volatile unsigned long *)(qpd->cwsr_kaddr + 
KFD_CWSR_TMA_OFFSET);
+   if (enabled)
+   set_bit(method, [2]);
+   else
+   clear_bit(method, [2]);
+   }
+}
+
 /*
  * On return the kfd_process is fully operational and will be freed when the
  * mm is released
-- 
2.25.1



[PATCH v3 18/24] drm/amdkfd: enable pc sampling stop

2023-12-15 Thread James Zhu
Enable pc sampling stop.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 28 +---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  4 +++
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 07e4c4a32e7b..02fa481d7457 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -88,10 +88,32 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd)
return -EINVAL;
 }
 
-static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
+static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   bool pc_sampling_stop = false;
+
+   pcs_entry->enabled = false;
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.active_count--;
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) {
+   WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, 
true);
+   pc_sampling_stop = true;
+   }
+   mutex_unlock(>dev->pcs_data.mutex);
 
+   if (pc_sampling_stop) {
+   kfd_process_set_trap_pc_sampling_flag(>qpd,
+   
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;
+   pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0;
+   WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, 
false);
+   mutex_unlock(>dev->pcs_data.mutex);
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_create(struct kfd_process_device *pdd,
@@ -233,7 +255,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (!pcs_entry->enabled)
return -EALREADY;
else
-   return kfd_pc_sample_stop(pdd);
+   return kfd_pc_sample_stop(pdd, pcs_entry);
}
 
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index b9a36891d099..0839a0ca3099 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -271,6 +271,10 @@ struct kfd_dev;
 
 struct kfd_dev_pc_sampling_data {
uint32_t use_count; /* Num of PC sampling sessions */
+   uint32_t active_count;  /* Num of active sessions */
+   uint32_t target_simd;   /* target simd for trap */
+   uint32_t target_wave_slot;  /* target wave slot for trap */
+   bool stop_enable;   /* pc sampling stop in process */
struct idr pc_sampling_idr;
struct kfd_pc_sample_info pc_sample_info;
 };
-- 
2.25.1



[PATCH v3 22/24] drm/amdkfd: add pc sampling release when process release

2023-12-15 Thread James Zhu
Add pc sampling release when process release, it will force to
stop all activate sessions with this process.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 21 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index c95d9ff08f6a..d8286aabd5a7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -300,6 +300,27 @@ static int kfd_pc_sample_destroy(struct kfd_process_device 
*pdd, uint32_t trace_
return 0;
 }
 
+void kfd_pc_sample_release(struct kfd_process_device *pdd)
+{
+   struct pc_sampling_entry *pcs_entry;
+   struct idr *idp;
+   uint32_t id;
+
+   /* force to release all PC sampling task for this process */
+   idp = >dev->pcs_data.hosttrap_entry.base.pc_sampling_idr;
+   mutex_lock(>dev->pcs_data.mutex);
+   idr_for_each_entry(idp, pcs_entry, id) {
+   if (pcs_entry->pdd != pdd)
+   continue;
+   mutex_unlock(>dev->pcs_data.mutex);
+   if (pcs_entry->enabled)
+   kfd_pc_sample_stop(pdd, pcs_entry);
+   kfd_pc_sample_destroy(pdd, id, pcs_entry);
+   mutex_lock(>dev->pcs_data.mutex);
+   }
+   mutex_unlock(>dev->pcs_data.mutex);
+}
+
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
index 4eeded4ea5b6..6175563ca9be 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
@@ -30,5 +30,6 @@
 
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args);
+void kfd_pc_sample_release(struct kfd_process_device *pdd);
 
 #endif /* KFD_PC_SAMPLING_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 6bc9dcfad484..1f8d6098dfb2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -43,6 +43,7 @@ struct mm_struct;
 #include "kfd_svm.h"
 #include "kfd_smi_events.h"
 #include "kfd_debug.h"
+#include "kfd_pc_sampling.h"
 
 /*
  * List of struct kfd_process (field kfd_process).
@@ -1021,6 +1022,8 @@ static void kfd_process_destroy_pdds(struct kfd_process 
*p)
pr_debug("Releasing pdd (topology id %d) for process (pasid 
0x%x)\n",
pdd->dev->id, p->pasid);
 
+   kfd_pc_sample_release(pdd);
+
kfd_process_device_destroy_cwsr_dgpu(pdd);
kfd_process_device_destroy_ib_mem(pdd);
 
-- 
2.25.1



[PATCH v3 11/24] drm/amdkfd/gfx9: enable host trap

2023-12-15 Thread James Zhu
Enable host trap.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 ---
 2 files changed, 52 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index df75863393fc..747426bd5181 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,14 +274,14 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf820258,
+   0xbf820001, 0xbf82025e,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -294,7 +294,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -303,13 +303,16 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   0xc0031c3d, 0x0010,
+   0xc0071bbd, 0x,
0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x86ee6e6e,
+   0xbf8cc07f, 0x8671ff6d,
+   0x0100, 0xbf840004,
+   0x92f1ff70, 0x00010001,
+   0xbf840016, 0xbf820005,
+   0x86708170, 0x8e709770,
+   0x8977ff77, 0x0080,
+   0x8077, 0x86ee6e6e,
0xbf840001, 0xbe801d6e,
0x866eff6d, 0x01ff,
0xbf850005, 0x8778ff78,
@@ -1098,14 +1101,14 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
 };
 
 static const uint32_t cwsr_trap_arcturus_hex[] = {
-   0xbf820001, 0xbf8202d4,
+   0xbf820001, 0xbf8202da,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1118,7 +1121,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -1127,13 +1130,16 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   0xc0031c3d, 0x0010,
+   0xc0071bbd, 0x,
0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x86ee6e6e,
+   0xbf8cc07f, 0x8671ff6d,
+   0x0100, 0xbf840004,
+   0x92f1ff70, 0x00010001,
+   0xbf840016, 0xbf820005,
+   0x86708170, 0x8e709770,
+   0x8977ff77, 0x0080,
+   0x8077, 0x86ee6e6e,
0xbf840001, 0xbe801d6e,
0x866eff6d, 0x01ff,
0xbf850005, 0x8778ff78,
@@ -1578,14 +1584,14 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
 };
 
 static const uint32_t cwsr_trap_aldebaran_hex[] = {
-   0xbf820001, 0xbf8202df,
+   0xbf820001, 0xbf8202e5,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1598,7 +1604,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -1607,13 +1613,16 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f

[PATCH v3 13/24] drm/amdgpu: add sq host trap status check

2023-12-15 Thread James Zhu
Before fire a new host trap, check the host trap status.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 35 +++
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |  2 ++
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |  5 +++
 3 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index adfe5e5585e5..43edd62df5fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1144,6 +1144,35 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
kgd_gfx_v9_unlock_srbm(adev, inst);
 }
 
+static uint32_t kgd_aldebaran_get_hosttrap_status(struct amdgpu_device *adev)
+{
+   uint32_t sq_hosttrap_status = 0x0;
+   int i, j;
+
+   mutex_lock(>grbm_idx_mutex);
+   for (i = 0; i < adev->gfx.config.max_shader_engines; i++) {
+   for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) {
+   amdgpu_gfx_select_se_sh(adev, i, j, 0x, 0);
+   sq_hosttrap_status = RREG32_SOC15(GC, 0, 
mmSQ_HOSTTRAP_STATUS);
+
+   if (sq_hosttrap_status & 
SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK) {
+   WREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS,
+   
SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK);
+   sq_hosttrap_status = 0x0;
+   continue;
+   }
+   if (sq_hosttrap_status)
+   goto out;
+   }
+   }
+
+out:
+   amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0);
+   mutex_unlock(>grbm_idx_mutex);
+
+   return sq_hosttrap_status;
+}
+
 uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
uint32_t vmid,
uint32_t max_wave_slot,
@@ -1154,6 +1183,12 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct 
amdgpu_device *adev,
 {
if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) {
uint32_t value = 0;
+   uint32_t sq_hosttrap_status = 0x0;
+
+   sq_hosttrap_status = kgd_aldebaran_get_hosttrap_status(adev);
+   /* skip when last host trap request is still pending to 
complete */
+   if (sq_hosttrap_status)
+   return 0;
 
value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP);
value = REG_SET_FIELD(value, SQ_CMD, MODE, 
SQ_IND_CMD_MODE_SINGLE);
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
index 12d451e5475b..5b17d9066452 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
@@ -462,6 +462,8 @@
 #define mmSQ_IND_DATA_BASE_IDX 
0
 #define mmSQ_CMD   
0x037b
 #define mmSQ_CMD_BASE_IDX  
0
+#define mmSQ_HOSTTRAP_STATUS   
0x0376
+#define mmSQ_HOSTTRAP_STATUS_BASE_IDX  
0
 #define mmSQ_TIME_HI   
0x037c
 #define mmSQ_TIME_HI_BASE_IDX  
0
 #define mmSQ_TIME_LO   
0x037d
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
index efc16ddf274a..3dfe4ab31421 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
@@ -2616,6 +2616,11 @@
 //SQ_CMD_TIMESTAMP
 #define SQ_CMD_TIMESTAMP__TIMESTAMP__SHIFT 
   0x0
 #define SQ_CMD_TIMESTAMP__TIMESTAMP_MASK   
   0x00FFL
+//SQ_HOSTTRAP_STATUS
+#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT__SHIFT  
   0x0
+#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE__SHIFT  
   0x8
+#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT_MASK
   0x00FFL
+#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK

[PATCH v3 24/24] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

2023-12-15 Thread James Zhu
Bump the minor version to declare pc sampling feature is now
available.

Signed-off-by: James Zhu 
---
 include/uapi/linux/kfd_ioctl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 1bd1347effea..62d8642d3d1c 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -40,9 +40,10 @@
  * - 1.12 - Add DMA buf export ioctl
  * - 1.13 - Add debugger API
  * - 1.14 - Update kfd_event_data
+ * - 1.15 - Add PC Sampling ioctl
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 14
+#define KFD_IOCTL_MINOR_VERSION 15
 
 struct kfd_ioctl_get_version_args {
__u32 major_version;/* from KFD */
-- 
2.25.1



[PATCH v3 12/24] drm/amdgpu: use trapID 4 for host trap

2023-12-15 Thread James Zhu
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use
TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify
the host trap.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 +
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 +
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |5 +
 3 files changed, 1070 insertions(+), 1054 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 7d8c0e13ac12..adfe5e5585e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1162,6 +1162,8 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct 
amdgpu_device *adev,
value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd);
/* select *target_wave_slot */
value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, 
(*target_wave_slot)++);
+   /* set TrapID 4 for HOSTTRAP */
+   value = REG_SET_FIELD(value, SQ_CMD, DATA, 0x4);
 
mutex_lock(>grbm_idx_mutex);
amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x, 0);
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 747426bd5181..44955838f307 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,155 +274,263 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf82025e,
+   0xbf820001, 0xbf820263,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
-   0x00ff, 0xbf85001e,
+   0x00ff, 0xbf850023,
0x866eff7b, 0x0400,
-   0xbf85005b, 0xbf8e0010,
+   0xbf850060, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
-   0xbf850015, 0x866eff7b,
-   0x71ff, 0xbf840008,
-   0x866fff7b, 0x7080,
-   0xbf840001, 0xbeee1a87,
-   0xb8eff801, 0x8e6e8c6e,
-   0x866e6f6e, 0xbf85000a,
-   0x866eff6d, 0x00ff,
-   0xbf850007, 0xb8eef801,
-   0x866eff6e, 0x0800,
-   0xbf850003, 0x866eff7b,
-   0x0400, 0xbf850040,
-   0xb8faf807, 0x867aff7a,
-   0x001f8000, 0x8e7a8b7a,
-   0x8977ff77, 0xfc00,
-   0x8a77, 0xba7ff807,
-   0x, 0xb8faf812,
-   0xb8fbf813, 0x8efa887a,
-   0xbf0d8f7b, 0xbf840002,
-   0x877bff7b, 0x,
-   0xc0031c3d, 0x0010,
-   0xc0071bbd, 0x,
-   0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x8671ff6d,
-   0x0100, 0xbf840004,
-   0x92f1ff70, 0x00010001,
-   0xbf840016, 0xbf820005,
-   0x86708170, 0x8e709770,
-   0x8977ff77, 0x0080,
-   0x8077, 0x86ee6e6e,
-   0xbf840001, 0xbe801d6e,
-   0x866eff6d, 0x01ff,
-   0xbf850005, 0x8778ff78,
-   0x2000, 0x80ec886c,
-   0x82ed806d, 0xbf820005,
-   0x866eff6d, 0x0100,
-   0xbf850002, 0x806c846c,
-   0x826d806d, 0x866dff6d,
-   0x, 0x8f7a8b77,
+   0xbf85001a, 0x866eff6d,
+   0x01ff, 0xbf06ff6e,
+   0x0104, 0xbf850015,
+   0x866eff7b, 0x71ff,
+   0xbf840008, 0x866fff7b,
+   0x7080, 0xbf840001,
+   0xbeee1a87, 0xb8eff801,
+   0x8e6e8c6e, 0x866e6f6e,
+   0xbf85000a, 0x866eff6d,
+   0x00ff, 0xbf850007,
+   0xb8eef801, 0x866eff6e,
+   0x0800, 0xbf850003,
+   0x866eff7b, 0x0400,
+   0xbf850040, 0xb8faf807,
0x867aff7a, 0x001f8000,
-   0xb97af807, 0x86fe7e7e,
-   0x86ea6a6a, 0x8f6e8378,
-   0xb96ee0c2, 0xbf82,
-   0xb9780002, 0xbe801f6c,
+   0x8e7a8b7a, 0x8977ff77,
+   0xfc00, 0x8a77,
+   0xba7ff807, 0x,
+   0xb8faf812, 0xb8fbf813,
+   0x8efa887a, 0xbf0d8f7b,
+   0xbf840002, 0x877bff7b,
+   0x, 0xc0031c3d,
+   0x0010, 0xc0071bbd,
+   0x, 0xc0071ebd,
+   0x0008, 0xbf8cc07f,
+   0x8671ff6d, 0x0100,
+   0xbf840004, 0x92f1ff70,
+   0x00010001, 0xbf840016,
+   0xbf820005, 0x86708170,
+   0x8e709770, 0x8977ff77,
+   0x0080, 0x8077,
+   0x86ee6e6e, 0xbf840001,
+   0xbe801d6e, 0x866eff6d,
+   0x01ff, 0xbf850005,
+   0x8778ff78, 0x2000,
+   0x80ec886c, 0x82ed806d,
+   0xbf820005, 0x866eff6d,
+   0x0100, 0xbf850002,
+   0x806c846c, 0x826d806d,
0x866dff6d, 0x,
-   0xbefa0080, 0xb97a0283,
-   0xb8faf807, 0x867aff7a,
-   0x001f8000, 0x8e7a8b7a,
-   0x8977ff77, 0xfc00,
-   0x8a77, 0xba7ff807,
-   0x, 0xbeee007e,
-   0xbeef007f, 0xbefe0180,
-   0xbf94, 0x877a8478,
-   0xb97af802, 0xbf8e0002,
-   0xbf88fffe, 0xb8fa2a05,
-   0x807a817a, 0x8e7a8

[PATCH v3 04/24] drm/amdkfd: add pc sampling mutex

2023-12-15 Thread James Zhu
Add pc sampling mutex per node, and do init/destroy in node init.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h   |  7 +++
 2 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0a9cf9dfc224..0e24e011f66b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -533,6 +533,16 @@ static void kfd_smi_init(struct kfd_node *dev)
spin_lock_init(>smi_lock);
 }
 
+static void kfd_pc_sampling_init(struct kfd_node *dev)
+{
+   mutex_init(>pcs_data.mutex);
+}
+
+static void kfd_pc_sampling_exit(struct kfd_node *dev)
+{
+   mutex_destroy(>pcs_data.mutex);
+}
+
 static int kfd_init_node(struct kfd_node *node)
 {
int err = -1;
@@ -563,6 +573,7 @@ static int kfd_init_node(struct kfd_node *node)
}
 
kfd_smi_init(node);
+   kfd_pc_sampling_init(node);
 
return 0;
 
@@ -593,6 +604,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned 
int num_nodes)
kfd_topology_remove_device(knode);
if (knode->gws)
amdgpu_amdkfd_free_gws(knode->adev, knode->gws);
+   kfd_pc_sampling_exit(knode);
kfree(knode);
kfd->nodes[i] = NULL;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 99426182bfc6..cbaa1bccd94b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -269,6 +269,11 @@ struct kfd_vmid_info {
 
 struct kfd_dev;
 
+/* Per device PC Sampling data */
+struct kfd_dev_pc_sampling {
+   struct mutex mutex;
+};
+
 struct kfd_node {
unsigned int node_id;
struct amdgpu_device *adev; /* Duplicated here along with keeping
@@ -322,6 +327,8 @@ struct kfd_node {
struct kfd_local_mem_info local_mem_info;
 
struct kfd_dev *kfd;
+
+   struct kfd_dev_pc_sampling pcs_data;
 };
 
 struct kfd_dev {
-- 
2.25.1



[PATCH v3 21/24] drm/amdkfd: add pc sampling thread to trigger trap

2023-12-15 Thread James Zhu
Add a kthread to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 68 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  1 +
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 42282f130fc3..c95d9ff08f6a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -39,6 +39,66 @@ struct supported_pc_sample_info supported_formats[] = {
{ IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
 };
 
+static int kfd_pc_sample_thread(void *param)
+{
+   struct amdgpu_device *adev;
+   struct kfd_node *node = param;
+   uint32_t timeout = 0;
+
+   mutex_lock(>pcs_data.mutex);
+   if (node->pcs_data.hosttrap_entry.base.active_count &&
+   node->pcs_data.hosttrap_entry.base.pc_sample_info.interval &&
+   node->kfd2kgd->trigger_pc_sample_trap) {
+   switch (node->pcs_data.hosttrap_entry.base.pc_sample_info.type) 
{
+   case KFD_IOCTL_PCS_TYPE_TIME_US:
+   timeout = 
(uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.interval;
+   break;
+   default:
+   pr_debug("PC Sampling type %d not supported.",
+   
node->pcs_data.hosttrap_entry.base.pc_sample_info.type);
+   }
+   }
+   mutex_unlock(>pcs_data.mutex);
+   if (!timeout)
+   return -EINVAL;
+
+   adev = node->adev;
+
+   allow_signal(SIGKILL);
+   while (!kthread_should_stop() ||
+   
!READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable)) {
+   node->kfd2kgd->trigger_pc_sample_trap(adev, 
node->vm_info.last_vmid_kfd,
+   >pcs_data.hosttrap_entry.base.target_simd,
+   
>pcs_data.hosttrap_entry.base.target_wave_slot,
+   
node->pcs_data.hosttrap_entry.base.pc_sample_info.method);
+   pr_debug_ratelimited("triggered a host trap.");
+
+   if 
(signal_pending(node->pcs_data.hosttrap_entry.base.pc_sample_thread))
+   break;
+   usleep_range(timeout, timeout + 10);
+   }
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL;
+
+   return 0;
+}
+
+static int kfd_pc_sample_thread_start(struct kfd_node *node)
+{
+   char thread_name[16];
+   int ret = 0;
+
+   snprintf(thread_name, 16, "pcs_%08x", node->adev->ddev.render->index);
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread =
+   kthread_run(kfd_pc_sample_thread, node, thread_name);
+   if (IS_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) {
+   ret = 
PTR_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread);
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL;
+   pr_debug("Failed to create pc sample thread for %s.\n", 
thread_name);
+   }
+
+   return ret;
+}
+
 static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
@@ -88,6 +148,7 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd,
struct pc_sampling_entry *pcs_entry)
 {
bool pc_sampling_start = false;
+   int ret = 0;
 
pcs_entry->enabled = true;
mutex_lock(>dev->pcs_data.mutex);
@@ -102,11 +163,13 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd,
} else {
kfd_process_set_trap_pc_sampling_flag(>qpd,

pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+   if 
(!pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread)
+   ret = kfd_pc_sample_thread_start(pdd->dev);
break;
}
}
 
-   return 0;
+   return ret;
 }
 
 static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
@@ -124,6 +187,9 @@ static int kfd_pc_sample_stop(struct kfd_process_device 
*pdd,
mutex_unlock(>dev->pcs_data.mutex);
 
if (pc_sampling_stop) {
+   
kthread_stop(pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread);
+   while (pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread)
+   usleep_range(1000, 2000);
kfd_process_set_trap_pc_sampling_flag(>qpd,

pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
remap_queue(pd

[PATCH v3 20/24] drm/amdkfd: enable pc sampling start

2023-12-15 Thread James Zhu
Enable pc sampling start.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index c9fd5b2a3330..42282f130fc3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -84,9 +84,29 @@ static int kfd_pc_sample_query_cap(struct kfd_process_device 
*pdd,
return 0;
 }
 
-static int kfd_pc_sample_start(struct kfd_process_device *pdd)
+static int kfd_pc_sample_start(struct kfd_process_device *pdd,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   bool pc_sampling_start = false;
+
+   pcs_entry->enabled = true;
+   mutex_lock(>dev->pcs_data.mutex);
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count)
+   pc_sampling_start = true;
+   pdd->dev->pcs_data.hosttrap_entry.base.active_count++;
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   while (pc_sampling_start) {
+   if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) {
+   usleep_range(1000, 2000);
+   } else {
+   kfd_process_set_trap_pc_sampling_flag(>qpd,
+   
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+   break;
+   }
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
@@ -252,7 +272,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (pcs_entry->enabled)
return -EALREADY;
else
-   return kfd_pc_sample_start(pdd);
+   return kfd_pc_sample_start(pdd, pcs_entry);
 
case KFD_IOCTL_PCS_OP_STOP:
if (!pcs_entry->enabled)
-- 
2.25.1



[PATCH v3 15/24] drm/amdkfd: trigger pc sampling trap for aldebaran

2023-12-15 Thread James Zhu
Implement trigger pc sampling trap for aldebaran.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index aff08321e976..27eda75ceecb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,6 +163,16 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
return watch_address_cntl;
 }
 
+static uint32_t kgd_aldebaran_trigger_pc_sample_trap(struct amdgpu_device 
*adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 8, 4,
+   target_simd, target_wave_slot, method);
+}
+
 const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -191,4 +201,5 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
+   .trigger_pc_sample_trap = kgd_aldebaran_trigger_pc_sample_trap,
 };
-- 
2.25.1



[PATCH v3 19/24] drm/amdkfd: add queue remapping

2023-12-15 Thread James Zhu
Add queue remapping to ensure that any waves executing the PC sampling
part of the trap handler are done before kfd_pc_sample_stop returns,
and that no new waves enter that part of the trap handler afterwards.
This avoids race conditions that could lead to use-after-free. Unmapping
and remapping the queues either waits for the waves to drain, or preempts
them with CWSR, which itself executes a trap and waits for previous traps
to finish.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  3 +++
 3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c0e71543389a..a3f57be63f4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager 
*dqm)
return debug_map_and_unlock(dqm);
 }
 
+void remap_queue(struct device_queue_manager *dqm,
+   enum kfd_unmap_queues_filter filter,
+   uint32_t filter_param,
+   uint32_t grace_period)
+{
+   dqm_lock(dqm);
+   if (!dqm->dev->kfd->shared_resources.enable_mes)
+   execute_queues_cpsch(dqm, filter, filter_param, grace_period);
+   dqm_unlock(dqm);
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index cf7e182588f8..f8aae3747a36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm);
 int debug_map_and_unlock(struct device_queue_manager *dqm);
 int debug_refresh_runlist(struct device_queue_manager *dqm);
 
+void remap_queue(struct device_queue_manager *dqm,
+   enum kfd_unmap_queues_filter filter,
+   uint32_t filter_param,
+   uint32_t grace_period);
+
 static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
 {
return (pdd->lds_base >> 16) & 0xFF;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 02fa481d7457..c9fd5b2a3330 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -24,6 +24,7 @@
 #include "kfd_priv.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_pc_sampling.h"
+#include "kfd_device_queue_manager.h"
 
 struct supported_pc_sample_info {
uint32_t ip_version;
@@ -105,6 +106,8 @@ static int kfd_pc_sample_stop(struct kfd_process_device 
*pdd,
if (pc_sampling_stop) {
kfd_process_set_trap_pc_sampling_flag(>qpd,

pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
+   remap_queue(pdd->dev->dqm,
+   KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, 
USE_DEFAULT_GRACE_PERIOD);
 
mutex_lock(>dev->pcs_data.mutex);
pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;
-- 
2.25.1



[PATCH v3 16/24] drm/amdkfd: use bit operation set debug trap

2023-12-15 Thread James Zhu
1st level TMA's 2nd byte which used for trap type setting,
to use bit operation to change selected bit only.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 71df51fcc1b0..1a31b556a5ff 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1440,13 +1440,23 @@ bool kfd_process_xnack_mode(struct kfd_process *p, bool 
supported)
return true;
 }
 
+/* bit offset in 1st-level TMA's 2nd byte which used for KFD_TRAP_TYPE_BIT */
+enum KFD_TRAP_TYPE_BIT {
+   KFD_TRAP_TYPE_DEBUG = 0,/* bit 0 for debug trap */
+   KFD_TRAP_TYPE_HOST,
+   KFD_TRAP_TYPE_STOCHASTIC,
+};
+
 void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
 bool enabled)
 {
if (qpd->cwsr_kaddr) {
-   uint64_t *tma =
-   (uint64_t *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET);
-   tma[2] = enabled;
+   volatile unsigned long *tma =
+   (volatile unsigned long *)(qpd->cwsr_kaddr + 
KFD_CWSR_TMA_OFFSET);
+   if (enabled)
+   set_bit(KFD_TRAP_TYPE_DEBUG, [2]);
+   else
+   clear_bit(KFD_TRAP_TYPE_DEBUG, [2]);
}
 }
 
-- 
2.25.1



[PATCH v3 09/24] drm/amdkfd: add interface to trigger pc sampling trap

2023-12-15 Thread James Zhu
Add interface to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 6d094cf3587d..05b0255aca37 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -33,6 +33,7 @@
 #include 
 #include "amdgpu_irq.h"
 #include "amdgpu_gfx.h"
+#include 
 
 struct pci_dev;
 struct amdgpu_device;
@@ -318,6 +319,11 @@ struct kfd2kgd_calls {
void (*program_trap_handler_settings)(struct amdgpu_device *adev,
uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr,
uint32_t inst);
+   uint32_t (*trigger_pc_sample_trap)(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method method);
 };
 
 #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
-- 
2.25.1



[PATCH v3 10/24] drm/amdkfd: trigger pc sampling trap for gfx v9

2023-12-15 Thread James Zhu
Implement trigger pc sampling trap for gfx v9.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 36 +++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  7 
 2 files changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 5a35a8ca8922..7d8c0e13ac12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1144,6 +1144,42 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
kgd_gfx_v9_unlock_srbm(adev, inst);
 }
 
+uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t max_wave_slot,
+   uint32_t max_simd,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) {
+   uint32_t value = 0;
+
+   value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP);
+   value = REG_SET_FIELD(value, SQ_CMD, MODE, 
SQ_IND_CMD_MODE_SINGLE);
+
+   /* select *target_simd */
+   value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd);
+   /* select *target_wave_slot */
+   value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, 
(*target_wave_slot)++);
+
+   mutex_lock(>grbm_idx_mutex);
+   amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x, 0);
+   WREG32_SOC15(GC, 0, mmSQ_CMD, value);
+   mutex_unlock(>grbm_idx_mutex);
+
+   *target_wave_slot %= max_wave_slot;
+   if (!(*target_wave_slot)) {
+   (*target_simd)++;
+   *target_simd %= max_simd;
+   }
+   } else {
+   pr_debug("PC Sampling method %d not supported.", method);
+   return -EOPNOTSUPP;
+   }
+   return 0;
+}
+
 const struct kfd2kgd_calls gfx_v9_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
index ce424615f59b..b47b926891a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
@@ -101,3 +101,10 @@ void kgd_gfx_v9_build_grace_period_packet_info(struct 
amdgpu_device *adev,
   uint32_t grace_period,
   uint32_t *reg_offset,
   uint32_t *reg_data);
+uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t max_wave_slot,
+   uint32_t max_simd,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method);
-- 
2.25.1



[PATCH v3 07/24] drm/amdkfd: check pcs_entry valid

2023-12-15 Thread James Zhu
Check pcs_entry valid for pc sampling ioctl.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++--
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 0ea51330acd8..193a8aa94d52 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -178,6 +178,24 @@ static int kfd_pc_sample_destroy(struct kfd_process_device 
*pdd, uint32_t trace_
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args)
 {
+   struct pc_sampling_entry *pcs_entry;
+
+   if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES &&
+   args->op != KFD_IOCTL_PCS_OP_CREATE) {
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pcs_entry = 
idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+   args->trace_id);
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   /* pcs_entry is only for this pc sampling process,
+* which has kfd_process->mutex protected here.
+*/
+   if (!pcs_entry ||
+   pcs_entry->pdd != pdd)
+   return -EINVAL;
+   }
+
switch (args->op) {
case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES:
return kfd_pc_sample_query_cap(pdd, args);
@@ -186,13 +204,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
return kfd_pc_sample_create(pdd, args);
 
case KFD_IOCTL_PCS_OP_DESTROY:
-   return kfd_pc_sample_destroy(pdd, args->trace_id);
+   if (pcs_entry->enabled)
+   return -EBUSY;
+   else
+   return kfd_pc_sample_destroy(pdd, args->trace_id);
 
case KFD_IOCTL_PCS_OP_START:
-   return kfd_pc_sample_start(pdd);
+   if (pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_start(pdd);
 
case KFD_IOCTL_PCS_OP_STOP:
-   return kfd_pc_sample_stop(pdd);
+   if (!pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_stop(pdd);
}
 
return -EINVAL;
-- 
2.25.1



[PATCH v3 14/24] drm/amdkfd: trigger pc sampling trap for arcturus

2023-12-15 Thread James Zhu
Implement trigger pc sampling trap for arcturus.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c| 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 0ba15dcbe4e1..10b362e072a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -390,6 +390,17 @@ static uint32_t kgd_arcturus_disable_debug_trap(struct 
amdgpu_device *adev,
 
return 0;
 }
+
+static uint32_t kgd_arcturus_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 10, 4,
+   target_simd, target_wave_slot, method);
+}
+
 const struct kfd2kgd_calls arcturus_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -418,5 +429,6 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = {
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
-   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings
+   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
+   .trigger_pc_sample_trap = kgd_arcturus_trigger_pc_sample_trap
 };
-- 
2.25.1



[PATCH v3 08/24] drm/amdkfd: enable pc sampling destroy

2023-12-15 Thread James Zhu
Enable pc sampling destroy.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 193a8aa94d52..07e4c4a32e7b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -169,10 +169,24 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
return 0;
 }
 
-static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id)
+static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   pr_debug("free pcs_entry = %p, trace_id = 0x%x on gpu 0x%x",
+   pcs_entry, trace_id, pdd->dev->id);
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count--;
+   idr_remove(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, 
trace_id);
 
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+   memset(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, 
0x0,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   kvfree(pcs_entry);
+
+   return 0;
 }
 
 int kfd_pc_sample(struct kfd_process_device *pdd,
@@ -207,7 +221,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (pcs_entry->enabled)
return -EBUSY;
else
-   return kfd_pc_sample_destroy(pdd, args->trace_id);
+   return kfd_pc_sample_destroy(pdd, args->trace_id, 
pcs_entry);
 
case KFD_IOCTL_PCS_OP_START:
if (pcs_entry->enabled)
-- 
2.25.1



[PATCH v3 00/24] Support Host Trap Sampling for gfx941/gfx942

2023-12-15 Thread James Zhu
PC sampling is a form of software profiling, where the threads of an application
are periodically interrupted and the program counter that the threads are 
currently
attempting to execute is saved out for profiling.

David Yat Sin (5):
  drm/amdkfd/kfd_ioctl: add pc sampling support
  drm/amdkfd: add pc sampling support
  drm/amdkfd: enable pc sampling query
  drm/amdkfd: enable pc sampling create
  drm/amdkfd: set debug trap bit when enabling PC Sampling

James Zhu (19):
  drm/amdkfd: add pc sampling mutex
  drm/amdkfd: add trace_id return
  drm/amdkfd: check pcs_entry valid
  drm/amdkfd: enable pc sampling destroy
  drm/amdkfd: add interface to trigger pc sampling trap
  drm/amdkfd: trigger pc sampling trap for gfx v9
  drm/amdkfd/gfx9: enable host trap
  drm/amdgpu: use trapID 4 for host trap
  drm/amdgpu: add sq host trap status check
  drm/amdkfd: trigger pc sampling trap for arcturus
  drm/amdkfd: trigger pc sampling trap for aldebaran
  drm/amdkfd: use bit operation set debug trap
  drm/amdkfd: add setting trap pc sampling flag
  drm/amdkfd: enable pc sampling stop
  drm/amdkfd: add queue remapping
  drm/amdkfd: enable pc sampling start
  drm/amdkfd: add pc sampling thread to trigger trap
  drm/amdkfd: add pc sampling release when process release
  drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   11 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   14 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   73 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 +
 drivers/gpu/drm/amd/amdkfd/Makefile   |3 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 +
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |   29 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   73 +-
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c|   22 +
 drivers/gpu/drm/amd/amdkfd/kfd_debug.h|3 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   14 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   11 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 +
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  405 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h  |   37 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   43 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   32 +-
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |2 +
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |5 +
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |6 +
 include/uapi/linux/kfd_ioctl.h|   60 +-
 21 files changed, 1881 insertions(+), 1080 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

-- 
2.25.1



[PATCH v3 06/24] drm/amdkfd: add trace_id return

2023-12-15 Thread James Zhu
Add trace_id return for new pc sampling creation per device,
Use IDR to quickly locate pc_sampling_entry for reference.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  6 ++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0e24e011f66b..bcaeedac8fe0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev)
 static void kfd_pc_sampling_init(struct kfd_node *dev)
 {
mutex_init(>pcs_data.mutex);
+   idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
 }
 
 static void kfd_pc_sampling_exit(struct kfd_node *dev)
 {
+   idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr);
mutex_destroy(>pcs_data.mutex);
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 106fac0ba1b3..0ea51330acd8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -99,6 +99,7 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
 {
struct kfd_pc_sample_info *supported_format = NULL;
struct kfd_pc_sample_info user_info;
+   struct pc_sampling_entry *pcs_entry;
int ret;
int i;
 
@@ -140,7 +141,19 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
return ret ? -EFAULT : -EEXIST;
}
 
-   /* TODO: add trace_id return */
+   pcs_entry = kzalloc(sizeof(*pcs_entry), GFP_KERNEL);
+   if (!pcs_entry) {
+   mutex_unlock(>dev->pcs_data.mutex);
+   return -ENOMEM;
+   }
+
+   i = 
idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+   pcs_entry, 1, 0, GFP_KERNEL);
+   if (i < 0) {
+   mutex_unlock(>dev->pcs_data.mutex);
+   kfree(pcs_entry);
+   return i;
+   }
 
if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = 
user_info;
@@ -148,6 +161,11 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
mutex_unlock(>dev->pcs_data.mutex);
 
+   pcs_entry->pdd = pdd;
+   user_args->trace_id = (uint32_t)i;
+
+   pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", 
pcs_entry, i, pdd->dev->id);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index db2d09db8000..7ca7cc726246 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -271,6 +271,7 @@ struct kfd_dev;
 
 struct kfd_dev_pc_sampling_data {
uint32_t use_count; /* Num of PC sampling sessions */
+   struct idr pc_sampling_idr;
struct kfd_pc_sample_info pc_sample_info;
 };
 
@@ -756,6 +757,11 @@ enum kfd_pdd_bound {
  */
 #define SDMA_ACTIVITY_DIVISOR  100
 
+struct pc_sampling_entry {
+   bool enabled;
+   struct kfd_process_device *pdd;
+};
+
 /* Data that is per-process-per device. */
 struct kfd_process_device {
/* The device that owns this data. */
-- 
2.25.1



[PATCH v3 03/24] drm/amdkfd: enable pc sampling query

2023-12-15 Thread James Zhu
From: David Yat Sin 

Enable pc sampling to query system capability.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index a7e78ff42d07..987c415f8f0f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -25,10 +25,62 @@
 #include "amdgpu_amdkfd.h"
 #include "kfd_pc_sampling.h"
 
+struct supported_pc_sample_info {
+   uint32_t ip_version;
+   const struct kfd_pc_sample_info *sample_info;
+};
+
+const struct kfd_pc_sample_info sample_info_hosttrap_9_0_0 = {
+   0, 1, ~0ULL, 0, KFD_IOCTL_PCS_METHOD_HOSTTRAP, 
KFD_IOCTL_PCS_TYPE_TIME_US };
+
+struct supported_pc_sample_info supported_formats[] = {
+   { IP_VERSION(9, 4, 1), _info_hosttrap_9_0_0 },
+   { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
+};
+
 static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
-   return -EINVAL;
+   uint64_t sample_offset;
+   int num_method = 0;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++)
+   if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version)
+   num_method++;
+
+   if (!num_method) {
+   pr_debug("PC Sampling not supported on GC_HWIP:0x%x.",
+   pdd->dev->adev->ip_versions[GC_HWIP][0]);
+   return -EOPNOTSUPP;
+   }
+
+   if (!user_args->sample_info_ptr || !user_args->num_sample_info) {
+   user_args->num_sample_info = num_method;
+   return 0;
+   }
+
+   if (user_args->num_sample_info < num_method) {
+   user_args->num_sample_info = num_method;
+   pr_debug("Sample info buffer is not large enough, "
+"ASIC requires space for %d kfd_pc_sample_info 
entries.", num_method);
+   return -ENOSPC;
+   }
+
+   sample_offset = user_args->sample_info_ptr;
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+   if (KFD_GC_VERSION(pdd->dev) == 
supported_formats[i].ip_version) {
+   int ret = copy_to_user((void __user *) sample_offset,
+   supported_formats[i].sample_info, sizeof(struct 
kfd_pc_sample_info));
+   if (ret) {
+   pr_debug("Failed to copy PC sampling info to 
user.");
+   return -EFAULT;
+   }
+   sample_offset += sizeof(struct kfd_pc_sample_info);
+   }
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_start(struct kfd_process_device *pdd)
-- 
2.25.1



[PATCH v3 02/24] drm/amdkfd: add pc sampling support

2023-12-15 Thread James Zhu
From: David Yat Sin 

Add pc sampling functions in amdkfd.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/Makefile  |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 +++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 78 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 34 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 13 
 5 files changed, 171 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index a5ae7bcf44eb..790fd028a681 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -57,7 +57,8 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_int_process_v11.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
$(AMDKFD_PATH)/kfd_crat.o \
-   $(AMDKFD_PATH)/kfd_debug.o
+   $(AMDKFD_PATH)/kfd_debug.o \
+   $(AMDKFD_PATH)/kfd_pc_sampling.o
 
 ifneq ($(CONFIG_DEBUG_FS),)
 AMDKFD_FILES += $(AMDKFD_PATH)/kfd_debugfs.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f6d4748c1980..1a3a8ded9c93 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -41,6 +41,7 @@
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
 #include "kfd_svm.h"
+#include "kfd_pc_sampling.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
 #include "amdgpu_dma_buf.h"
@@ -1750,6 +1751,38 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+static int kfd_ioctl_pc_sample(struct file *filep,
+  struct kfd_process *p, void __user *data)
+{
+   struct kfd_ioctl_pc_sample_args *args = data;
+   struct kfd_process_device *pdd;
+   int ret;
+
+   if (sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+   pr_err("PC Sampling does not support sched_policy %i", 
sched_policy);
+   return -EINVAL;
+   }
+
+   mutex_lock(>mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+
+   if (!pdd) {
+   pr_debug("could not find gpu id 0x%x.", args->gpu_id);
+   ret = -EINVAL;
+   } else {
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
+   if (IS_ERR(pdd)) {
+   pr_debug("failed to bind process %p with gpu id 0x%x", 
p, args->gpu_id);
+   ret = -ESRCH;
+   } else {
+   ret = kfd_pc_sample(pdd, args);
+   }
+   }
+   mutex_unlock(>mutex);
+
+   return ret;
+}
+
 static int criu_checkpoint_process(struct kfd_process *p,
 uint8_t __user *user_priv_data,
 uint64_t *priv_offset)
@@ -3224,6 +3257,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_DBG_TRAP,
kfd_ioctl_set_debug_trap, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_PC_SAMPLE,
+   kfd_ioctl_pc_sample, KFD_IOC_FLAG_PERFMON),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
@@ -3300,6 +3336,14 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
}
}
 
+   /* PC Sampling Monitor */
+   if (unlikely(ioctl->flags & KFD_IOC_FLAG_PERFMON)) {
+   if (!capable(CAP_PERFMON) && !capable(CAP_SYS_ADMIN)) {
+   retcode = -EACCES;
+   goto err_i1;
+   }
+   }
+
if (cmd & (IOC_IN | IOC_OUT)) {
if (asize <= sizeof(stack_kdata)) {
kdata = stack_kdata;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
new file mode 100644
index ..a7e78ff42d07
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission

[PATCH v3 05/24] drm/amdkfd: enable pc sampling create

2023-12-15 Thread James Zhu
From: David Yat Sin 

Enable pc sampling create.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 53 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 987c415f8f0f..106fac0ba1b3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -97,7 +97,58 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
 static int kfd_pc_sample_create(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
-   return -EINVAL;
+   struct kfd_pc_sample_info *supported_format = NULL;
+   struct kfd_pc_sample_info user_info;
+   int ret;
+   int i;
+
+   if (user_args->num_sample_info != 1)
+   return -EINVAL;
+
+   ret = copy_from_user(_info, (void __user *) 
user_args->sample_info_ptr,
+   sizeof(struct kfd_pc_sample_info));
+   if (ret) {
+   pr_debug("Failed to copy PC sampling info from user\n");
+   return -EFAULT;
+   }
+
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+   if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version
+   && user_info.method == 
supported_formats[i].sample_info->method
+   && user_info.type == 
supported_formats[i].sample_info->type
+   && user_info.interval <= 
supported_formats[i].sample_info->interval_max
+   && user_info.interval >= 
supported_formats[i].sample_info->interval_min) {
+   supported_format =
+   (struct kfd_pc_sample_info 
*)supported_formats[i].sample_info;
+   break;
+   }
+   }
+
+   if (!supported_format) {
+   pr_debug("Sampling format is not supported!");
+   return -EOPNOTSUPP;
+   }
+
+   mutex_lock(>dev->pcs_data.mutex);
+   if (pdd->dev->pcs_data.hosttrap_entry.base.use_count &&
+   memcmp(>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   _info, sizeof(user_info))) {
+   ret = copy_to_user((void __user *) user_args->sample_info_ptr,
+   >dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(>dev->pcs_data.mutex);
+   return ret ? -EFAULT : -EEXIST;
+   }
+
+   /* TODO: add trace_id return */
+
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+   pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = 
user_info;
+
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   return 0;
 }
 
 static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index cbaa1bccd94b..db2d09db8000 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -269,9 +269,19 @@ struct kfd_vmid_info {
 
 struct kfd_dev;
 
+struct kfd_dev_pc_sampling_data {
+   uint32_t use_count; /* Num of PC sampling sessions */
+   struct kfd_pc_sample_info pc_sample_info;
+};
+
+struct kfd_dev_pcs_hosttrap {
+   struct kfd_dev_pc_sampling_data base;
+};
+
 /* Per device PC Sampling data */
 struct kfd_dev_pc_sampling {
struct mutex mutex;
+   struct kfd_dev_pcs_hosttrap hosttrap_entry;
 };
 
 struct kfd_node {
-- 
2.25.1



[PATCH v3 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support

2023-12-15 Thread James Zhu
From: David Yat Sin 

Add pc sampling support in kfd_ioctl.

The user mode code which uses this new kfd_ioctl is linked to
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
with master branch.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 include/uapi/linux/kfd_ioctl.h | 57 +-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index f0ed68974c54..1bd1347effea 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1446,6 +1446,58 @@ struct kfd_ioctl_dbg_trap_args {
};
 };
 
+/**
+ * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations
+ *
+ * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities
+ * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a 
per-device PC sampler instance
+ * @KFD_IOCTL_PCS_OP_DESTROY:Unregister from a previously 
registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_START:  Process begins taking samples from a 
previously registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_STOP:   Process stops taking samples from a 
previously registered PC sampler instance
+ */
+enum kfd_ioctl_pc_sample_op {
+   KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES,
+   KFD_IOCTL_PCS_OP_CREATE,
+   KFD_IOCTL_PCS_OP_DESTROY,
+   KFD_IOCTL_PCS_OP_START,
+   KFD_IOCTL_PCS_OP_STOP,
+};
+
+/* Values have to be a power of 2*/
+#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001
+
+enum kfd_ioctl_pc_sample_method {
+   KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1,
+   KFD_IOCTL_PCS_METHOD_STOCHASTIC,
+};
+
+enum kfd_ioctl_pc_sample_type {
+   KFD_IOCTL_PCS_TYPE_TIME_US,
+   KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES,
+   KFD_IOCTL_PCS_TYPE_INSTRUCTIONS
+};
+
+struct kfd_pc_sample_info {
+   __u64 interval;  /* [IN] if PCS_TYPE_INTERVAL_US: sample interval 
in us
+ * if PCS_TYPE_CLOCK_CYCLES: sample interval in 
graphics core clk cycles
+ * if PCS_TYPE_INSTRUCTIONS: sample interval in 
instructions issued by
+ * graphics compute units
+ */
+   __u64 interval_min;  /* [OUT] */
+   __u64 interval_max;  /* [OUT] */
+   __u64 flags; /* [OUT] indicate potential restrictions e.g 
FLAG_POWER_OF_2 */
+   __u32 method;/* [IN/OUT] kfd_ioctl_pc_sample_method */
+   __u32 type;  /* [IN/OUT] kfd_ioctl_pc_sample_type */
+};
+
+struct kfd_ioctl_pc_sample_args {
+   __u64 sample_info_ptr;   /* array of kfd_pc_sample_info */
+   __u32 num_sample_info;
+   __u32 op;/* kfd_ioctl_pc_sample_op */
+   __u32 gpu_id;
+   __u32 trace_id;
+};
+
 #define AMDKFD_IOCTL_BASE 'K'
 #define AMDKFD_IO(nr)  _IO(AMDKFD_IOCTL_BASE, nr)
 #define AMDKFD_IOR(nr, type)   _IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -1566,7 +1618,10 @@ struct kfd_ioctl_dbg_trap_args {
 #define AMDKFD_IOC_DBG_TRAP\
AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args)
 
+#define AMDKFD_IOC_PC_SAMPLE   \
+   AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args)
+
 #define AMDKFD_COMMAND_START   0x01
-#define AMDKFD_COMMAND_END 0x27
+#define AMDKFD_COMMAND_END 0x28
 
 #endif
-- 
2.25.1



Re: [PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout

2023-12-13 Thread James Zhu



On 2023-12-13 11:23, Felix Kuehling wrote:


On 2023-12-13 10:24, James Zhu wrote:

Ping ...

On 2023-12-08 18:01, James Zhu wrote:

When application tries to allocate all system memory and cause memory
to swap out. Needs more time for hmm_range_fault to validate the
remaining page for allocation. To be safe, increase timeout value to
1 second for 64MB range.

Signed-off-by: James Zhu 


This is not the first time we're incrementing this timeout. Eventually 
we should get rid of that and find a way to make this work reliably 
without a timeout. There can always be situations where faults take 
longer, and we should not fail randomly in those cases.


There are also some FIXMEs in this code that should be addressed at 
the same time.


That said, as a short-term fix, this patch is

[JZ] Yes, it is just a short-term fix. the root cause is still under study,


Acked-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

index 081267161d40..b24eb5821fd1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -190,8 +190,8 @@ int amdgpu_hmm_range_get_pages(struct 
mmu_interval_notifier *notifier,

  pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
  hmm_range->start, hmm_range->end);
  -    /* Assuming 128MB takes maximum 1 second to fault page 
address */

-    timeout = max((hmm_range->end - hmm_range->start) >> 27, 1UL);
+    /* Assuming 64MB takes maximum 1 second to fault page 
address */

+    timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL);
  timeout *= HMM_RANGE_DEFAULT_TIMEOUT;
  timeout = jiffies + msecs_to_jiffies(timeout);


Re: [PATCH v2 03/23] drm/amdkfd: enable pc sampling query

2023-12-13 Thread James Zhu


On 2023-12-12 19:55, Yat Sin, David wrote:

[AMD Official Use Only - General]


-Original Message-
From: Zhu, James
Sent: Thursday, December 7, 2023 5:54 PM
To:amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix; Greathouse, Joseph
; Yat Sin, David;
Zhu, James
Subject: [PATCH v2 03/23] drm/amdkfd: enable pc sampling query

From: David Yat Sin

Enable pc sampling to query system capability.

Co-developed-by: James Zhu
Signed-off-by: James Zhu
Signed-off-by: David Yat Sin
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54
+++-
  1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index a7e78ff42d07..49fecbc7013e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -25,10 +25,62 @@
  #include "amdgpu_amdkfd.h"
  #include "kfd_pc_sampling.h"

+struct supported_pc_sample_info {
+ uint32_t ip_version;
+ const struct kfd_pc_sample_info *sample_info; };
+
+const struct kfd_pc_sample_info sample_info_hosttrap_9_0_0 = {
+ 0, 1, ~0ULL, 0, KFD_IOCTL_PCS_METHOD_HOSTTRAP,
+KFD_IOCTL_PCS_TYPE_TIME_US };
+
+struct supported_pc_sample_info supported_formats[] = {
+ { IP_VERSION(9, 4, 1), _info_hosttrap_9_0_0 },
+ { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 }, };
+
  static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
   struct kfd_ioctl_pc_sample_args
__user *user_args)  {
- return -EINVAL;
+ uint64_t sample_offset;
+ int num_method = 0;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(supported_formats); i++)
+ if (KFD_GC_VERSION(pdd->dev) ==
supported_formats[i].ip_version)
+ num_method++;
+
+ if (!num_method) {
+ pr_debug("PC Sampling not supported on GC_HWIP:0x%x.",
+ pdd->dev->adev->ip_versions[GC_HWIP][0]);
+ return -EOPNOTSUPP;
+ }
+
+ if (!user_args->sample_info_ptr) {

Should be:
if (!user_args->sample_info_ptr || !user_args->num_sample_info) {


+ user_args->num_sample_info = num_method;
+ return 0;
+ }
+
+ if (user_args->num_sample_info < num_method) {
+ user_args->num_sample_info = num_method;
+ pr_debug("Sample info buffer is not large enough, "
+  "ASIC requires space for %d kfd_pc_sample_info
entries.", num_method);
+ return -ENOSPC;
+ }
+
+ sample_offset = user_args->sample_info_ptr;

If there is another active PC Sampling session that is active, I thought we 
were planning to have code to
return a reduced list with only the methods that are compatible with the 
current active session. Did we
decide to drop this behavior?
[JZ] Do we have design changed here? I though we allow sharing the 
sameactive PC Sampling session between multiple processes.


Regards,
David


+ for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+ if (KFD_GC_VERSION(pdd->dev) ==
supported_formats[i].ip_version) {
+ int ret = copy_to_user((void __user *) sample_offset,
+ supported_formats[i].sample_info,
sizeof(struct kfd_pc_sample_info));
+ if (ret) {
+ pr_debug("Failed to copy PC sampling info to
user.");
+ return -EFAULT;
+ }
+ sample_offset += sizeof(struct kfd_pc_sample_info);
+ }
+ }
+
+ return 0;
  }

  static int kfd_pc_sample_start(struct kfd_process_device *pdd)
--
2.25.1

Re: [PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout

2023-12-13 Thread James Zhu

Ping ...

On 2023-12-08 18:01, James Zhu wrote:

When application tries to allocate all system memory and cause memory
to swap out. Needs more time for hmm_range_fault to validate the
remaining page for allocation. To be safe, increase timeout value to
1 second for 64MB range.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index 081267161d40..b24eb5821fd1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -190,8 +190,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
hmm_range->start, hmm_range->end);
  
-		/* Assuming 128MB takes maximum 1 second to fault page address */

-   timeout = max((hmm_range->end - hmm_range->start) >> 27, 1UL);
+   /* Assuming 64MB takes maximum 1 second to fault page address */
+   timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL);
timeout *= HMM_RANGE_DEFAULT_TIMEOUT;
timeout = jiffies + msecs_to_jiffies(timeout);
  


[PATCH v2 2/2] drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages

2023-12-11 Thread James Zhu
Only schedule when hmm_range_fault returns error.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index b24eb5821fd1..55b65fc04b65 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -199,6 +199,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
hmm_range->notifier_seq = mmu_interval_read_begin(notifier);
r = hmm_range_fault(hmm_range);
if (unlikely(r)) {
+   schedule();
/*
 * FIXME: This timeout should encompass the retry from
 * mmu_interval_read_retry() as well.
@@ -212,7 +213,6 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
break;
hmm_range->hmm_pfns += MAX_WALK_BYTE >> PAGE_SHIFT;
hmm_range->start = hmm_range->end;
-   schedule();
} while (hmm_range->end < end);
 
hmm_range->start = start;
-- 
2.25.1



[PATCH v3 00/23] Support Host Trap Sampling for gfx941/gfx942

2023-12-11 Thread James Zhu
PC sampling is a form of software profiling, where the threads of an application
are periodically interrupted and the program counter that the threads are 
currently
attempting to execute is saved out for profiling.

The user mode code which uses this new kfd_ioctl is linked to
https://github.com/zhums/ROCT-Thunk-Interface/tree/zhums/ROCT-Thunk.

David Yat Sin (4):
  drm/amdkfd/kfd_ioctl: add pc sampling support
  drm/amdkfd: add pc sampling support
  drm/amdkfd: enable pc sampling query
  drm/amdkfd: enable pc sampling create

James Zhu (19):
  drm/amdkfd: add pc sampling mutex
  drm/amdkfd: add trace_id return
  drm/amdkfd: check pcs_enrty valid
  drm/amdkfd: enable pc sampling destroy
  drm/amdkfd: add interface to trigger pc sampling trap
  drm/amdkfd: trigger pc sampling trap for gfx v9
  drm/amdkfd/gfx9: enable host trap
  drm/amdgpu: use trapID 4 for host trap
  drm/amdgpu: add sq host trap status check
  drm/amdkfd: trigger pc sampling trap for arcturus
  drm/amdkfd: trigger pc sampling trap for aldebaran
  drm/amdkfd: use bit operation set debug trap
  drm/amdkfd: add setting trap pc sampling flag
  drm/amdkfd: enable pc sampling stop
  drm/amdkfd: add queue remapping
  drm/amdkfd: enable pc sampling start
  drm/amdkfd: add pc sampling thread to trigger trap
  drm/amdkfd: add pc sampling release when process release
  drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   11 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   14 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   73 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 +
 drivers/gpu/drm/amd/amdkfd/Makefile   |3 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 +
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |   29 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   44 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   14 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   11 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 +
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  372 +++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h  |   35 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   43 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   32 +-
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |2 +
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |5 +
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |6 +
 include/uapi/linux/kfd_ioctl.h|   60 +-
 19 files changed, 1813 insertions(+), 1059 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

-- 
2.25.1



[PATCH v3 07/23] drm/amdkfd: check pcs_entry valid

2023-12-11 Thread James Zhu
Check pcs_entry valid for pc sampling ioctl.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++--
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index b44dfea15539..e5aa87b2da4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -178,6 +178,24 @@ static int kfd_pc_sample_destroy(struct kfd_process_device 
*pdd, uint32_t trace_
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args)
 {
+   struct pc_sampling_entry *pcs_entry;
+
+   if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES &&
+   args->op != KFD_IOCTL_PCS_OP_CREATE) {
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pcs_entry = 
idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+   args->trace_id);
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   /* pcs_entry is only for this pc sampling process,
+* which has kfd_process->mutex protected here.
+*/
+   if (!pcs_entry ||
+   pcs_entry->pdd != pdd)
+   return -EINVAL;
+   }
+
switch (args->op) {
case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES:
return kfd_pc_sample_query_cap(pdd, args);
@@ -186,13 +204,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
return kfd_pc_sample_create(pdd, args);
 
case KFD_IOCTL_PCS_OP_DESTROY:
-   return kfd_pc_sample_destroy(pdd, args->trace_id);
+   if (pcs_entry->enabled)
+   return -EBUSY;
+   else
+   return kfd_pc_sample_destroy(pdd, args->trace_id);
 
case KFD_IOCTL_PCS_OP_START:
-   return kfd_pc_sample_start(pdd);
+   if (pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_start(pdd);
 
case KFD_IOCTL_PCS_OP_STOP:
-   return kfd_pc_sample_stop(pdd);
+   if (!pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_stop(pdd);
}
 
return -EINVAL;
-- 
2.25.1



Re: [PATCH v2 00/23] Support Host Trap Sampling for gfx941/gfx942

2023-12-11 Thread James Zhu

Ping ...

On 2023-12-07 17:53, James Zhu wrote:

PC sampling is a form of software profiling, where the threads of an application
are periodically interrupted and the program counter that the threads are 
currently
attempting to execute is saved out for profiling.

David Yat Sin (4):
   drm/amdkfd/kfd_ioctl: add pc sampling support
   drm/amdkfd: add pc sampling support
   drm/amdkfd: enable pc sampling query
   drm/amdkfd: enable pc sampling create

James Zhu (19):
   drm/amdkfd: add pc sampling mutex
   drm/amdkfd: add trace_id return
   drm/amdkfd: check pcs_enrty valid
   drm/amdkfd: enable pc sampling destroy
   drm/amdkfd: add interface to trigger pc sampling trap
   drm/amdkfd: trigger pc sampling trap for gfx v9
   drm/amdkfd/gfx9: enable host trap
   drm/amdgpu: use trapID 4 for host trap
   drm/amdgpu: add sq host trap status check
   drm/amdkfd: trigger pc sampling trap for arcturus
   drm/amdkfd: trigger pc sampling trap for aldebaran
   drm/amdkfd: use bit operation set debug trap
   drm/amdkfd: add setting trap pc sampling flag
   drm/amdkfd: enable pc sampling stop
   drm/amdkfd: add queue remapping
   drm/amdkfd: enable pc sampling start
   drm/amdkfd: add pc sampling thread to trigger trap
   drm/amdkfd: add pc sampling release when process release
   drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

  .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   11 +
  .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   14 +-
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   73 +
  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 +
  drivers/gpu/drm/amd/amdkfd/Makefile   |3 +-
  .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 +
  .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |   29 +-
  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   44 +
  drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   14 +
  .../drm/amd/amdkfd/kfd_device_queue_manager.c |   11 +
  .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 +
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  372 +++
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h  |   35 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   43 +
  drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   32 +-
  .../amd/include/asic_reg/gc/gc_9_0_offset.h   |2 +
  .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |5 +
  .../gpu/drm/amd/include/kgd_kfd_interface.h   |6 +
  include/uapi/linux/kfd_ioctl.h|   60 +-
  19 files changed, 1813 insertions(+), 1059 deletions(-)
  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h



Re: [PATCH 2/2] drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages

2023-12-11 Thread James Zhu



On 2023-12-11 05:38, Christian König wrote:

Am 09.12.23 um 00:01 schrieb James Zhu:

Needn't do schedule for each hmm_range_fault, and use cond_resched
to replace schedule.


cond_resched() is usually NAKed upstream since it is a NO-OP in most 
situations.

[JZ] then let me change back to schedule(); Thanks!


IIRC there was even a patch set to completely remove it.

Christian.



Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c

index b24eb5821fd1..c77c4eceea46 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -199,6 +199,7 @@ int amdgpu_hmm_range_get_pages(struct 
mmu_interval_notifier *notifier,

  hmm_range->notifier_seq = mmu_interval_read_begin(notifier);
  r = hmm_range_fault(hmm_range);
  if (unlikely(r)) {
+    cond_resched();
  /*
   * FIXME: This timeout should encompass the retry from
   * mmu_interval_read_retry() as well.
@@ -212,7 +213,6 @@ int amdgpu_hmm_range_get_pages(struct 
mmu_interval_notifier *notifier,

  break;
  hmm_range->hmm_pfns += MAX_WALK_BYTE >> PAGE_SHIFT;
  hmm_range->start = hmm_range->end;
-    schedule();
  } while (hmm_range->end < end);
    hmm_range->start = start;




[PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout

2023-12-08 Thread James Zhu
When application tries to allocate all system memory and cause memory
to swap out. Needs more time for hmm_range_fault to validate the
remaining page for allocation. To be safe, increase timeout value to
1 second for 64MB range.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index 081267161d40..b24eb5821fd1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -190,8 +190,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
hmm_range->start, hmm_range->end);
 
-   /* Assuming 128MB takes maximum 1 second to fault page address 
*/
-   timeout = max((hmm_range->end - hmm_range->start) >> 27, 1UL);
+   /* Assuming 64MB takes maximum 1 second to fault page address */
+   timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL);
timeout *= HMM_RANGE_DEFAULT_TIMEOUT;
timeout = jiffies + msecs_to_jiffies(timeout);
 
-- 
2.25.1



[PATCH 2/2] drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages

2023-12-08 Thread James Zhu
Needn't do schedule for each hmm_range_fault, and use cond_resched
to replace schedule.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index b24eb5821fd1..c77c4eceea46 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -199,6 +199,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
hmm_range->notifier_seq = mmu_interval_read_begin(notifier);
r = hmm_range_fault(hmm_range);
if (unlikely(r)) {
+   cond_resched();
/*
 * FIXME: This timeout should encompass the retry from
 * mmu_interval_read_retry() as well.
@@ -212,7 +213,6 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier 
*notifier,
break;
hmm_range->hmm_pfns += MAX_WALK_BYTE >> PAGE_SHIFT;
hmm_range->start = hmm_range->end;
-   schedule();
} while (hmm_range->end < end);
 
hmm_range->start = start;
-- 
2.25.1



[PATCH v2 22/23] drm/amdkfd: add pc sampling release when process release

2023-12-07 Thread James Zhu
Add pc sampling release when process release, it will force to
stop all activate sessions with this process.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 21 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 04cc25c79a76..a05dd8b1a7da 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -300,6 +300,27 @@ static int kfd_pc_sample_destroy(struct kfd_process_device 
*pdd, uint32_t trace_
return 0;
 }
 
+void kfd_pc_sample_release(struct kfd_process_device *pdd)
+{
+   struct pc_sampling_entry *pcs_entry;
+   struct idr *idp;
+   uint32_t id;
+
+   /* force to release all PC sampling task for this process */
+   idp = >dev->pcs_data.hosttrap_entry.base.pc_sampling_idr;
+   mutex_lock(>dev->pcs_data.mutex);
+   idr_for_each_entry(idp, pcs_entry, id) {
+   if (pcs_entry->pdd != pdd)
+   continue;
+   mutex_unlock(>dev->pcs_data.mutex);
+   if (pcs_entry->enabled)
+   kfd_pc_sample_stop(pdd, pcs_entry);
+   kfd_pc_sample_destroy(pdd, id, pcs_entry);
+   mutex_lock(>dev->pcs_data.mutex);
+   }
+   mutex_unlock(>dev->pcs_data.mutex);
+}
+
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
index 4eeded4ea5b6..6175563ca9be 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
@@ -30,5 +30,6 @@
 
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args);
+void kfd_pc_sample_release(struct kfd_process_device *pdd);
 
 #endif /* KFD_PC_SAMPLING_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 6bc9dcfad484..1f8d6098dfb2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -43,6 +43,7 @@ struct mm_struct;
 #include "kfd_svm.h"
 #include "kfd_smi_events.h"
 #include "kfd_debug.h"
+#include "kfd_pc_sampling.h"
 
 /*
  * List of struct kfd_process (field kfd_process).
@@ -1021,6 +1022,8 @@ static void kfd_process_destroy_pdds(struct kfd_process 
*p)
pr_debug("Releasing pdd (topology id %d) for process (pasid 
0x%x)\n",
pdd->dev->id, p->pasid);
 
+   kfd_pc_sample_release(pdd);
+
kfd_process_device_destroy_cwsr_dgpu(pdd);
kfd_process_device_destroy_ib_mem(pdd);
 
-- 
2.25.1



[PATCH v2 23/23] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

2023-12-07 Thread James Zhu
Bump the minor version to declare pc sampling feature is now
available.

Signed-off-by: James Zhu 
---
 include/uapi/linux/kfd_ioctl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 1bd1347effea..62d8642d3d1c 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -40,9 +40,10 @@
  * - 1.12 - Add DMA buf export ioctl
  * - 1.13 - Add debugger API
  * - 1.14 - Update kfd_event_data
+ * - 1.15 - Add PC Sampling ioctl
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 14
+#define KFD_IOCTL_MINOR_VERSION 15
 
 struct kfd_ioctl_get_version_args {
__u32 major_version;/* from KFD */
-- 
2.25.1



[PATCH v2 21/23] drm/amdkfd: add pc sampling thread to trigger trap

2023-12-07 Thread James Zhu
Add a kthread to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 68 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  1 +
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 49b5d4c9f7e0..04cc25c79a76 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -39,6 +39,66 @@ struct supported_pc_sample_info supported_formats[] = {
{ IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
 };
 
+static int kfd_pc_sample_thread(void *param)
+{
+   struct amdgpu_device *adev;
+   struct kfd_node *node = param;
+   uint32_t timeout = 0;
+
+   mutex_lock(>pcs_data.mutex);
+   if (node->pcs_data.hosttrap_entry.base.active_count &&
+   node->pcs_data.hosttrap_entry.base.pc_sample_info.interval &&
+   node->kfd2kgd->trigger_pc_sample_trap) {
+   switch (node->pcs_data.hosttrap_entry.base.pc_sample_info.type) 
{
+   case KFD_IOCTL_PCS_TYPE_TIME_US:
+   timeout = 
(uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.interval;
+   break;
+   default:
+   pr_debug("PC Sampling type %d not supported.",
+   
node->pcs_data.hosttrap_entry.base.pc_sample_info.type);
+   }
+   }
+   mutex_unlock(>pcs_data.mutex);
+   if (!timeout)
+   return -EINVAL;
+
+   adev = node->adev;
+
+   allow_signal(SIGKILL);
+   while (!kthread_should_stop() ||
+   
!READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable)) {
+   node->kfd2kgd->trigger_pc_sample_trap(adev, 
node->vm_info.last_vmid_kfd,
+   >pcs_data.hosttrap_entry.base.target_simd,
+   
>pcs_data.hosttrap_entry.base.target_wave_slot,
+   
node->pcs_data.hosttrap_entry.base.pc_sample_info.method);
+   pr_debug_ratelimited("triggered a host trap.");
+
+   if 
(signal_pending(node->pcs_data.hosttrap_entry.base.pc_sample_thread))
+   break;
+   usleep_range(timeout, timeout + 10);
+   }
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL;
+
+   return 0;
+}
+
+static int kfd_pc_sample_thread_start(struct kfd_node *node)
+{
+   char thread_name[32];
+   int ret = 0;
+
+   snprintf(thread_name, 32, "pc_sampling_%08x", node->id);
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread =
+   kthread_run(kfd_pc_sample_thread, node, thread_name);
+   if (IS_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) {
+   ret = 
PTR_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread);
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL;
+   pr_debug("Failed to create pc sample thread for %s.\n", 
thread_name);
+   }
+
+   return ret;
+}
+
 static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
@@ -88,6 +148,7 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd,
struct pc_sampling_entry *pcs_entry)
 {
bool pc_sampling_start = false;
+   int ret = 0;
 
pcs_entry->enabled = true;
mutex_lock(>dev->pcs_data.mutex);
@@ -102,11 +163,13 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd,
} else {
kfd_process_set_trap_pc_sampling_flag(>qpd,

pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+   if 
(!pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread)
+   ret = kfd_pc_sample_thread_start(pdd->dev);
break;
}
}
 
-   return 0;
+   return ret;
 }
 
 static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
@@ -124,6 +187,9 @@ static int kfd_pc_sample_stop(struct kfd_process_device 
*pdd,
mutex_unlock(>dev->pcs_data.mutex);
 
if (pc_sampling_stop) {
+   
kthread_stop(pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread);
+   while (pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread)
+   usleep_range(1000, 2000);
kfd_process_set_trap_pc_sampling_flag(>qpd,

pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
remap_queue(pdd->dev->dqm,
d

[PATCH v2 12/23] drm/amdgpu: use trapID 4 for host trap

2023-12-07 Thread James Zhu
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use
TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify
the host trap.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 +
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 +
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |5 +
 3 files changed, 1070 insertions(+), 1054 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 7d8c0e13ac12..adfe5e5585e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1162,6 +1162,8 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct 
amdgpu_device *adev,
value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd);
/* select *target_wave_slot */
value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, 
(*target_wave_slot)++);
+   /* set TrapID 4 for HOSTTRAP */
+   value = REG_SET_FIELD(value, SQ_CMD, DATA, 0x4);
 
mutex_lock(>grbm_idx_mutex);
amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x, 0);
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 747426bd5181..44955838f307 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,155 +274,263 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf82025e,
+   0xbf820001, 0xbf820263,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
-   0x00ff, 0xbf85001e,
+   0x00ff, 0xbf850023,
0x866eff7b, 0x0400,
-   0xbf85005b, 0xbf8e0010,
+   0xbf850060, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
-   0xbf850015, 0x866eff7b,
-   0x71ff, 0xbf840008,
-   0x866fff7b, 0x7080,
-   0xbf840001, 0xbeee1a87,
-   0xb8eff801, 0x8e6e8c6e,
-   0x866e6f6e, 0xbf85000a,
-   0x866eff6d, 0x00ff,
-   0xbf850007, 0xb8eef801,
-   0x866eff6e, 0x0800,
-   0xbf850003, 0x866eff7b,
-   0x0400, 0xbf850040,
-   0xb8faf807, 0x867aff7a,
-   0x001f8000, 0x8e7a8b7a,
-   0x8977ff77, 0xfc00,
-   0x8a77, 0xba7ff807,
-   0x, 0xb8faf812,
-   0xb8fbf813, 0x8efa887a,
-   0xbf0d8f7b, 0xbf840002,
-   0x877bff7b, 0x,
-   0xc0031c3d, 0x0010,
-   0xc0071bbd, 0x,
-   0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x8671ff6d,
-   0x0100, 0xbf840004,
-   0x92f1ff70, 0x00010001,
-   0xbf840016, 0xbf820005,
-   0x86708170, 0x8e709770,
-   0x8977ff77, 0x0080,
-   0x8077, 0x86ee6e6e,
-   0xbf840001, 0xbe801d6e,
-   0x866eff6d, 0x01ff,
-   0xbf850005, 0x8778ff78,
-   0x2000, 0x80ec886c,
-   0x82ed806d, 0xbf820005,
-   0x866eff6d, 0x0100,
-   0xbf850002, 0x806c846c,
-   0x826d806d, 0x866dff6d,
-   0x, 0x8f7a8b77,
+   0xbf85001a, 0x866eff6d,
+   0x01ff, 0xbf06ff6e,
+   0x0104, 0xbf850015,
+   0x866eff7b, 0x71ff,
+   0xbf840008, 0x866fff7b,
+   0x7080, 0xbf840001,
+   0xbeee1a87, 0xb8eff801,
+   0x8e6e8c6e, 0x866e6f6e,
+   0xbf85000a, 0x866eff6d,
+   0x00ff, 0xbf850007,
+   0xb8eef801, 0x866eff6e,
+   0x0800, 0xbf850003,
+   0x866eff7b, 0x0400,
+   0xbf850040, 0xb8faf807,
0x867aff7a, 0x001f8000,
-   0xb97af807, 0x86fe7e7e,
-   0x86ea6a6a, 0x8f6e8378,
-   0xb96ee0c2, 0xbf82,
-   0xb9780002, 0xbe801f6c,
+   0x8e7a8b7a, 0x8977ff77,
+   0xfc00, 0x8a77,
+   0xba7ff807, 0x,
+   0xb8faf812, 0xb8fbf813,
+   0x8efa887a, 0xbf0d8f7b,
+   0xbf840002, 0x877bff7b,
+   0x, 0xc0031c3d,
+   0x0010, 0xc0071bbd,
+   0x, 0xc0071ebd,
+   0x0008, 0xbf8cc07f,
+   0x8671ff6d, 0x0100,
+   0xbf840004, 0x92f1ff70,
+   0x00010001, 0xbf840016,
+   0xbf820005, 0x86708170,
+   0x8e709770, 0x8977ff77,
+   0x0080, 0x8077,
+   0x86ee6e6e, 0xbf840001,
+   0xbe801d6e, 0x866eff6d,
+   0x01ff, 0xbf850005,
+   0x8778ff78, 0x2000,
+   0x80ec886c, 0x82ed806d,
+   0xbf820005, 0x866eff6d,
+   0x0100, 0xbf850002,
+   0x806c846c, 0x826d806d,
0x866dff6d, 0x,
-   0xbefa0080, 0xb97a0283,
-   0xb8faf807, 0x867aff7a,
-   0x001f8000, 0x8e7a8b7a,
-   0x8977ff77, 0xfc00,
-   0x8a77, 0xba7ff807,
-   0x, 0xbeee007e,
-   0xbeef007f, 0xbefe0180,
-   0xbf94, 0x877a8478,
-   0xb97af802, 0xbf8e0002,
-   0xbf88fffe, 0xb8fa2a05,
-   0x807a817a, 0x8e7a8

[PATCH v2 19/23] drm/amdkfd: add queue remapping

2023-12-07 Thread James Zhu
Add queue remapping to ensure that any waves executing the PC sampling
part of the trap handler are done before kfd_pc_sample_stop returns,
and that no new waves enter that part of the trap handler afterwards.
This avoids race conditions that could lead to use-after-free. Unmapping
and remapping the queues either waits for the waves to drain, or preempts
them with CWSR, which itself executes a trap and waits for previous traps
to finish.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  3 +++
 3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c0e71543389a..a3f57be63f4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager 
*dqm)
return debug_map_and_unlock(dqm);
 }
 
+void remap_queue(struct device_queue_manager *dqm,
+   enum kfd_unmap_queues_filter filter,
+   uint32_t filter_param,
+   uint32_t grace_period)
+{
+   dqm_lock(dqm);
+   if (!dqm->dev->kfd->shared_resources.enable_mes)
+   execute_queues_cpsch(dqm, filter, filter_param, grace_period);
+   dqm_unlock(dqm);
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index cf7e182588f8..f8aae3747a36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm);
 int debug_map_and_unlock(struct device_queue_manager *dqm);
 int debug_refresh_runlist(struct device_queue_manager *dqm);
 
+void remap_queue(struct device_queue_manager *dqm,
+   enum kfd_unmap_queues_filter filter,
+   uint32_t filter_param,
+   uint32_t grace_period);
+
 static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
 {
return (pdd->lds_base >> 16) & 0xFF;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 29a6f9f40f83..7d0722498bf5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -24,6 +24,7 @@
 #include "kfd_priv.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_pc_sampling.h"
+#include "kfd_device_queue_manager.h"
 
 struct supported_pc_sample_info {
uint32_t ip_version;
@@ -105,6 +106,8 @@ static int kfd_pc_sample_stop(struct kfd_process_device 
*pdd,
if (pc_sampling_stop) {
kfd_process_set_trap_pc_sampling_flag(>qpd,

pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
+   remap_queue(pdd->dev->dqm,
+   KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, 
USE_DEFAULT_GRACE_PERIOD);
 
mutex_lock(>dev->pcs_data.mutex);
pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;
-- 
2.25.1



[PATCH v2 15/23] drm/amdkfd: trigger pc sampling trap for aldebaran

2023-12-07 Thread James Zhu
Implement trigger pc sampling trap for aldebaran.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index aff08321e976..27eda75ceecb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,6 +163,16 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
return watch_address_cntl;
 }
 
+static uint32_t kgd_aldebaran_trigger_pc_sample_trap(struct amdgpu_device 
*adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 8, 4,
+   target_simd, target_wave_slot, method);
+}
+
 const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -191,4 +201,5 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
+   .trigger_pc_sample_trap = kgd_aldebaran_trigger_pc_sample_trap,
 };
-- 
2.25.1



[PATCH v2 16/23] drm/amdkfd: use bit operation set debug trap

2023-12-07 Thread James Zhu
1st level TMA's 2nd byte which used for trap type setting,
to use bit operation to change selected bit only.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 71df51fcc1b0..1a31b556a5ff 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1440,13 +1440,23 @@ bool kfd_process_xnack_mode(struct kfd_process *p, bool 
supported)
return true;
 }
 
+/* bit offset in 1st-level TMA's 2nd byte which used for KFD_TRAP_TYPE_BIT */
+enum KFD_TRAP_TYPE_BIT {
+   KFD_TRAP_TYPE_DEBUG = 0,/* bit 0 for debug trap */
+   KFD_TRAP_TYPE_HOST,
+   KFD_TRAP_TYPE_STOCHASTIC,
+};
+
 void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
 bool enabled)
 {
if (qpd->cwsr_kaddr) {
-   uint64_t *tma =
-   (uint64_t *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET);
-   tma[2] = enabled;
+   volatile unsigned long *tma =
+   (volatile unsigned long *)(qpd->cwsr_kaddr + 
KFD_CWSR_TMA_OFFSET);
+   if (enabled)
+   set_bit(KFD_TRAP_TYPE_DEBUG, [2]);
+   else
+   clear_bit(KFD_TRAP_TYPE_DEBUG, [2]);
}
 }
 
-- 
2.25.1



[PATCH v2 18/23] drm/amdkfd: enable pc sampling stop

2023-12-07 Thread James Zhu
Enable pc sampling stop.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 28 +---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  4 +++
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 18fe06d712c5..29a6f9f40f83 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -88,10 +88,32 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd)
return -EINVAL;
 }
 
-static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
+static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   bool pc_sampling_stop = false;
+
+   pcs_entry->enabled = false;
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.active_count--;
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) {
+   WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, 
true);
+   pc_sampling_stop = true;
+   }
+   mutex_unlock(>dev->pcs_data.mutex);
 
+   if (pc_sampling_stop) {
+   kfd_process_set_trap_pc_sampling_flag(>qpd,
+   
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;
+   pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0;
+   WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, 
false);
+   mutex_unlock(>dev->pcs_data.mutex);
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_create(struct kfd_process_device *pdd,
@@ -233,7 +255,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (!pcs_entry->enabled)
return -EALREADY;
else
-   return kfd_pc_sample_stop(pdd);
+   return kfd_pc_sample_stop(pdd, pcs_entry);
}
 
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index b9a36891d099..0839a0ca3099 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -271,6 +271,10 @@ struct kfd_dev;
 
 struct kfd_dev_pc_sampling_data {
uint32_t use_count; /* Num of PC sampling sessions */
+   uint32_t active_count;  /* Num of active sessions */
+   uint32_t target_simd;   /* target simd for trap */
+   uint32_t target_wave_slot;  /* target wave slot for trap */
+   bool stop_enable;   /* pc sampling stop in process */
struct idr pc_sampling_idr;
struct kfd_pc_sample_info pc_sample_info;
 };
-- 
2.25.1



[PATCH v2 17/23] drm/amdkfd: add setting trap pc sampling flag

2023-12-07 Thread James Zhu
Add setting trap pc sampling flag.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 7ca7cc726246..b9a36891d099 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1198,6 +1198,8 @@ void kfd_process_set_trap_handler(struct 
qcm_process_device *qpd,
  uint64_t tma_addr);
 void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
 bool enabled);
+void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd,
+enum kfd_ioctl_pc_sample_method method, 
bool enabled);
 
 /* CWSR initialization */
 int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file *filep);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 1a31b556a5ff..6bc9dcfad484 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1460,6 +1460,19 @@ void kfd_process_set_trap_debug_flag(struct 
qcm_process_device *qpd,
}
 }
 
+void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd,
+enum kfd_ioctl_pc_sample_method method, 
bool enabled)
+{
+   if (qpd->cwsr_kaddr) {
+   volatile unsigned long *tma =
+   (volatile unsigned long *)(qpd->cwsr_kaddr + 
KFD_CWSR_TMA_OFFSET);
+   if (enabled)
+   set_bit(method, [2]);
+   else
+   clear_bit(method, [2]);
+   }
+}
+
 /*
  * On return the kfd_process is fully operational and will be freed when the
  * mm is released
-- 
2.25.1



[PATCH v2 20/23] drm/amdkfd: enable pc sampling start

2023-12-07 Thread James Zhu
Enable pc sampling start.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 7d0722498bf5..49b5d4c9f7e0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -84,9 +84,29 @@ static int kfd_pc_sample_query_cap(struct kfd_process_device 
*pdd,
return 0;
 }
 
-static int kfd_pc_sample_start(struct kfd_process_device *pdd)
+static int kfd_pc_sample_start(struct kfd_process_device *pdd,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   bool pc_sampling_start = false;
+
+   pcs_entry->enabled = true;
+   mutex_lock(>dev->pcs_data.mutex);
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count)
+   pc_sampling_start = true;
+   pdd->dev->pcs_data.hosttrap_entry.base.active_count++;
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   while (pc_sampling_start) {
+   if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) {
+   usleep_range(1000, 2000);
+   } else {
+   kfd_process_set_trap_pc_sampling_flag(>qpd,
+   
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+   break;
+   }
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
@@ -252,7 +272,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (pcs_entry->enabled)
return -EALREADY;
else
-   return kfd_pc_sample_start(pdd);
+   return kfd_pc_sample_start(pdd, pcs_entry);
 
case KFD_IOCTL_PCS_OP_STOP:
if (!pcs_entry->enabled)
-- 
2.25.1



[PATCH v2 11/23] drm/amdkfd/gfx9: enable host trap

2023-12-07 Thread James Zhu
Enable host trap.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 ---
 2 files changed, 52 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index df75863393fc..747426bd5181 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,14 +274,14 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf820258,
+   0xbf820001, 0xbf82025e,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -294,7 +294,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -303,13 +303,16 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   0xc0031c3d, 0x0010,
+   0xc0071bbd, 0x,
0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x86ee6e6e,
+   0xbf8cc07f, 0x8671ff6d,
+   0x0100, 0xbf840004,
+   0x92f1ff70, 0x00010001,
+   0xbf840016, 0xbf820005,
+   0x86708170, 0x8e709770,
+   0x8977ff77, 0x0080,
+   0x8077, 0x86ee6e6e,
0xbf840001, 0xbe801d6e,
0x866eff6d, 0x01ff,
0xbf850005, 0x8778ff78,
@@ -1098,14 +1101,14 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
 };
 
 static const uint32_t cwsr_trap_arcturus_hex[] = {
-   0xbf820001, 0xbf8202d4,
+   0xbf820001, 0xbf8202da,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1118,7 +1121,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -1127,13 +1130,16 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   0xc0031c3d, 0x0010,
+   0xc0071bbd, 0x,
0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x86ee6e6e,
+   0xbf8cc07f, 0x8671ff6d,
+   0x0100, 0xbf840004,
+   0x92f1ff70, 0x00010001,
+   0xbf840016, 0xbf820005,
+   0x86708170, 0x8e709770,
+   0x8977ff77, 0x0080,
+   0x8077, 0x86ee6e6e,
0xbf840001, 0xbe801d6e,
0x866eff6d, 0x01ff,
0xbf850005, 0x8778ff78,
@@ -1578,14 +1584,14 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
 };
 
 static const uint32_t cwsr_trap_aldebaran_hex[] = {
-   0xbf820001, 0xbf8202df,
+   0xbf820001, 0xbf8202e5,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1598,7 +1604,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -1607,13 +1613,16 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f

[PATCH v2 13/23] drm/amdgpu: add sq host trap status check

2023-12-07 Thread James Zhu
Before fire a new host trap, check the host trap status.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 35 +++
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |  2 ++
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |  5 +++
 3 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index adfe5e5585e5..43edd62df5fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1144,6 +1144,35 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
kgd_gfx_v9_unlock_srbm(adev, inst);
 }
 
+static uint32_t kgd_aldebaran_get_hosttrap_status(struct amdgpu_device *adev)
+{
+   uint32_t sq_hosttrap_status = 0x0;
+   int i, j;
+
+   mutex_lock(>grbm_idx_mutex);
+   for (i = 0; i < adev->gfx.config.max_shader_engines; i++) {
+   for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) {
+   amdgpu_gfx_select_se_sh(adev, i, j, 0x, 0);
+   sq_hosttrap_status = RREG32_SOC15(GC, 0, 
mmSQ_HOSTTRAP_STATUS);
+
+   if (sq_hosttrap_status & 
SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK) {
+   WREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS,
+   
SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK);
+   sq_hosttrap_status = 0x0;
+   continue;
+   }
+   if (sq_hosttrap_status)
+   goto out;
+   }
+   }
+
+out:
+   amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0);
+   mutex_unlock(>grbm_idx_mutex);
+
+   return sq_hosttrap_status;
+}
+
 uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
uint32_t vmid,
uint32_t max_wave_slot,
@@ -1154,6 +1183,12 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct 
amdgpu_device *adev,
 {
if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) {
uint32_t value = 0;
+   uint32_t sq_hosttrap_status = 0x0;
+
+   sq_hosttrap_status = kgd_aldebaran_get_hosttrap_status(adev);
+   /* skip when last host trap request is still pending to 
complete */
+   if (sq_hosttrap_status)
+   return 0;
 
value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP);
value = REG_SET_FIELD(value, SQ_CMD, MODE, 
SQ_IND_CMD_MODE_SINGLE);
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
index 12d451e5475b..5b17d9066452 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
@@ -462,6 +462,8 @@
 #define mmSQ_IND_DATA_BASE_IDX 
0
 #define mmSQ_CMD   
0x037b
 #define mmSQ_CMD_BASE_IDX  
0
+#define mmSQ_HOSTTRAP_STATUS   
0x0376
+#define mmSQ_HOSTTRAP_STATUS_BASE_IDX  
0
 #define mmSQ_TIME_HI   
0x037c
 #define mmSQ_TIME_HI_BASE_IDX  
0
 #define mmSQ_TIME_LO   
0x037d
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
index efc16ddf274a..3dfe4ab31421 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
@@ -2616,6 +2616,11 @@
 //SQ_CMD_TIMESTAMP
 #define SQ_CMD_TIMESTAMP__TIMESTAMP__SHIFT 
   0x0
 #define SQ_CMD_TIMESTAMP__TIMESTAMP_MASK   
   0x00FFL
+//SQ_HOSTTRAP_STATUS
+#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT__SHIFT  
   0x0
+#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE__SHIFT  
   0x8
+#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT_MASK
   0x00FFL
+#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK

[PATCH v2 09/23] drm/amdkfd: add interface to trigger pc sampling trap

2023-12-07 Thread James Zhu
Add interface to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 6d094cf3587d..05b0255aca37 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -33,6 +33,7 @@
 #include 
 #include "amdgpu_irq.h"
 #include "amdgpu_gfx.h"
+#include 
 
 struct pci_dev;
 struct amdgpu_device;
@@ -318,6 +319,11 @@ struct kfd2kgd_calls {
void (*program_trap_handler_settings)(struct amdgpu_device *adev,
uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr,
uint32_t inst);
+   uint32_t (*trigger_pc_sample_trap)(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method method);
 };
 
 #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
-- 
2.25.1



[PATCH v2 14/23] drm/amdkfd: trigger pc sampling trap for arcturus

2023-12-07 Thread James Zhu
Implement trigger pc sampling trap for arcturus.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c| 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 0ba15dcbe4e1..10b362e072a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -390,6 +390,17 @@ static uint32_t kgd_arcturus_disable_debug_trap(struct 
amdgpu_device *adev,
 
return 0;
 }
+
+static uint32_t kgd_arcturus_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 10, 4,
+   target_simd, target_wave_slot, method);
+}
+
 const struct kfd2kgd_calls arcturus_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -418,5 +429,6 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = {
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
-   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings
+   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
+   .trigger_pc_sample_trap = kgd_arcturus_trigger_pc_sample_trap
 };
-- 
2.25.1



[PATCH v2 08/23] drm/amdkfd: enable pc sampling destroy

2023-12-07 Thread James Zhu
Enable pc sampling destroy.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index e5aa87b2da4f..18fe06d712c5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -169,10 +169,24 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
return 0;
 }
 
-static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id)
+static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   pr_debug("free pcs_entry = %p, trace_id = 0x%x on gpu 0x%x",
+   pcs_entry, trace_id, pdd->dev->id);
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count--;
+   idr_remove(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, 
trace_id);
 
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+   memset(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, 
0x0,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   kvfree(pcs_entry);
+
+   return 0;
 }
 
 int kfd_pc_sample(struct kfd_process_device *pdd,
@@ -207,7 +221,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (pcs_entry->enabled)
return -EBUSY;
else
-   return kfd_pc_sample_destroy(pdd, args->trace_id);
+   return kfd_pc_sample_destroy(pdd, args->trace_id, 
pcs_entry);
 
case KFD_IOCTL_PCS_OP_START:
if (pcs_entry->enabled)
-- 
2.25.1



[PATCH v2 10/23] drm/amdkfd: trigger pc sampling trap for gfx v9

2023-12-07 Thread James Zhu
Implement trigger pc sampling trap for gfx v9.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 36 +++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  7 
 2 files changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 5a35a8ca8922..7d8c0e13ac12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1144,6 +1144,42 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
kgd_gfx_v9_unlock_srbm(adev, inst);
 }
 
+uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t max_wave_slot,
+   uint32_t max_simd,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) {
+   uint32_t value = 0;
+
+   value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP);
+   value = REG_SET_FIELD(value, SQ_CMD, MODE, 
SQ_IND_CMD_MODE_SINGLE);
+
+   /* select *target_simd */
+   value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd);
+   /* select *target_wave_slot */
+   value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, 
(*target_wave_slot)++);
+
+   mutex_lock(>grbm_idx_mutex);
+   amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x, 0);
+   WREG32_SOC15(GC, 0, mmSQ_CMD, value);
+   mutex_unlock(>grbm_idx_mutex);
+
+   *target_wave_slot %= max_wave_slot;
+   if (!(*target_wave_slot)) {
+   (*target_simd)++;
+   *target_simd %= max_simd;
+   }
+   } else {
+   pr_debug("PC Sampling method %d not supported.", method);
+   return -EOPNOTSUPP;
+   }
+   return 0;
+}
+
 const struct kfd2kgd_calls gfx_v9_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
index ce424615f59b..b47b926891a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
@@ -101,3 +101,10 @@ void kgd_gfx_v9_build_grace_period_packet_info(struct 
amdgpu_device *adev,
   uint32_t grace_period,
   uint32_t *reg_offset,
   uint32_t *reg_data);
+uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t max_wave_slot,
+   uint32_t max_simd,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method);
-- 
2.25.1



[PATCH v2 06/23] drm/amdkfd: add trace_id return

2023-12-07 Thread James Zhu
Add trace_id return for new pc sampling creation per device,
Use IDR to quickly locate pc_sampling_entry for reference.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  6 ++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0e24e011f66b..bcaeedac8fe0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev)
 static void kfd_pc_sampling_init(struct kfd_node *dev)
 {
mutex_init(>pcs_data.mutex);
+   idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
 }
 
 static void kfd_pc_sampling_exit(struct kfd_node *dev)
 {
+   idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr);
mutex_destroy(>pcs_data.mutex);
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 7828a6340edf..b44dfea15539 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -99,6 +99,7 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
 {
struct kfd_pc_sample_info *supported_format = NULL;
struct kfd_pc_sample_info user_info;
+   struct pc_sampling_entry *pcs_entry;
int ret;
int i;
 
@@ -140,7 +141,19 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
return ret ? -EFAULT : -EEXIST;
}
 
-   /* TODO: add trace_id return */
+   pcs_entry = kzalloc(sizeof(*pcs_entry), GFP_KERNEL);
+   if (!pcs_entry) {
+   mutex_unlock(>dev->pcs_data.mutex);
+   return -ENOMEM;
+   }
+
+   i = 
idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+   pcs_entry, 1, 0, GFP_KERNEL);
+   if (i < 0) {
+   mutex_unlock(>dev->pcs_data.mutex);
+   kfree(pcs_entry);
+   return i;
+   }
 
if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = 
user_info;
@@ -148,6 +161,11 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
mutex_unlock(>dev->pcs_data.mutex);
 
+   pcs_entry->pdd = pdd;
+   user_args->trace_id = (uint32_t)i;
+
+   pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", 
pcs_entry, i, pdd->dev->id);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index db2d09db8000..7ca7cc726246 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -271,6 +271,7 @@ struct kfd_dev;
 
 struct kfd_dev_pc_sampling_data {
uint32_t use_count; /* Num of PC sampling sessions */
+   struct idr pc_sampling_idr;
struct kfd_pc_sample_info pc_sample_info;
 };
 
@@ -756,6 +757,11 @@ enum kfd_pdd_bound {
  */
 #define SDMA_ACTIVITY_DIVISOR  100
 
+struct pc_sampling_entry {
+   bool enabled;
+   struct kfd_process_device *pdd;
+};
+
 /* Data that is per-process-per device. */
 struct kfd_process_device {
/* The device that owns this data. */
-- 
2.25.1



[PATCH v2 03/23] drm/amdkfd: enable pc sampling query

2023-12-07 Thread James Zhu
From: David Yat Sin 

Enable pc sampling to query system capability.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index a7e78ff42d07..49fecbc7013e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -25,10 +25,62 @@
 #include "amdgpu_amdkfd.h"
 #include "kfd_pc_sampling.h"
 
+struct supported_pc_sample_info {
+   uint32_t ip_version;
+   const struct kfd_pc_sample_info *sample_info;
+};
+
+const struct kfd_pc_sample_info sample_info_hosttrap_9_0_0 = {
+   0, 1, ~0ULL, 0, KFD_IOCTL_PCS_METHOD_HOSTTRAP, 
KFD_IOCTL_PCS_TYPE_TIME_US };
+
+struct supported_pc_sample_info supported_formats[] = {
+   { IP_VERSION(9, 4, 1), _info_hosttrap_9_0_0 },
+   { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
+};
+
 static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
-   return -EINVAL;
+   uint64_t sample_offset;
+   int num_method = 0;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++)
+   if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version)
+   num_method++;
+
+   if (!num_method) {
+   pr_debug("PC Sampling not supported on GC_HWIP:0x%x.",
+   pdd->dev->adev->ip_versions[GC_HWIP][0]);
+   return -EOPNOTSUPP;
+   }
+
+   if (!user_args->sample_info_ptr) {
+   user_args->num_sample_info = num_method;
+   return 0;
+   }
+
+   if (user_args->num_sample_info < num_method) {
+   user_args->num_sample_info = num_method;
+   pr_debug("Sample info buffer is not large enough, "
+"ASIC requires space for %d kfd_pc_sample_info 
entries.", num_method);
+   return -ENOSPC;
+   }
+
+   sample_offset = user_args->sample_info_ptr;
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+   if (KFD_GC_VERSION(pdd->dev) == 
supported_formats[i].ip_version) {
+   int ret = copy_to_user((void __user *) sample_offset,
+   supported_formats[i].sample_info, sizeof(struct 
kfd_pc_sample_info));
+   if (ret) {
+   pr_debug("Failed to copy PC sampling info to 
user.");
+   return -EFAULT;
+   }
+   sample_offset += sizeof(struct kfd_pc_sample_info);
+   }
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_start(struct kfd_process_device *pdd)
-- 
2.25.1



[PATCH v2 05/23] drm/amdkfd: enable pc sampling create

2023-12-07 Thread James Zhu
From: David Yat Sin 

Enable pc sampling create.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 53 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 49fecbc7013e..7828a6340edf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -97,7 +97,58 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
 static int kfd_pc_sample_create(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
-   return -EINVAL;
+   struct kfd_pc_sample_info *supported_format = NULL;
+   struct kfd_pc_sample_info user_info;
+   int ret;
+   int i;
+
+   if (user_args->num_sample_info != 1)
+   return -EINVAL;
+
+   ret = copy_from_user(_info, (void __user *) 
user_args->sample_info_ptr,
+   sizeof(struct kfd_pc_sample_info));
+   if (ret) {
+   pr_debug("Failed to copy PC sampling info from user\n");
+   return -EFAULT;
+   }
+
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+   if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version
+   && user_info.method == 
supported_formats[i].sample_info->method
+   && user_info.type == 
supported_formats[i].sample_info->type
+   && user_info.interval <= 
supported_formats[i].sample_info->interval_max
+   && user_info.interval >= 
supported_formats[i].sample_info->interval_min) {
+   supported_format =
+   (struct kfd_pc_sample_info 
*)supported_formats[i].sample_info;
+   break;
+   }
+   }
+
+   if (!supported_format) {
+   pr_debug("Sampling format is not supported!");
+   return -EOPNOTSUPP;
+   }
+
+   mutex_lock(>dev->pcs_data.mutex);
+   if (pdd->dev->pcs_data.hosttrap_entry.base.use_count &&
+   memcmp(>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   _info, sizeof(user_info))) {
+   ret = copy_to_user((void __user *) user_args->sample_info_ptr,
+   >dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(>dev->pcs_data.mutex);
+   return ret ? -EFAULT : -EEXIST;
+   }
+
+   /* TODO: add trace_id return */
+
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+   pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = 
user_info;
+
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   return 0;
 }
 
 static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index cbaa1bccd94b..db2d09db8000 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -269,9 +269,19 @@ struct kfd_vmid_info {
 
 struct kfd_dev;
 
+struct kfd_dev_pc_sampling_data {
+   uint32_t use_count; /* Num of PC sampling sessions */
+   struct kfd_pc_sample_info pc_sample_info;
+};
+
+struct kfd_dev_pcs_hosttrap {
+   struct kfd_dev_pc_sampling_data base;
+};
+
 /* Per device PC Sampling data */
 struct kfd_dev_pc_sampling {
struct mutex mutex;
+   struct kfd_dev_pcs_hosttrap hosttrap_entry;
 };
 
 struct kfd_node {
-- 
2.25.1



[PATCH v2 07/23] drm/amdkfd: check pcs_enrty valid

2023-12-07 Thread James Zhu
Check pcs_entry valid for pc sampling ioctl.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++--
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index b44dfea15539..e5aa87b2da4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -178,6 +178,24 @@ static int kfd_pc_sample_destroy(struct kfd_process_device 
*pdd, uint32_t trace_
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args)
 {
+   struct pc_sampling_entry *pcs_entry;
+
+   if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES &&
+   args->op != KFD_IOCTL_PCS_OP_CREATE) {
+
+   mutex_lock(>dev->pcs_data.mutex);
+   pcs_entry = 
idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+   args->trace_id);
+   mutex_unlock(>dev->pcs_data.mutex);
+
+   /* pcs_entry is only for this pc sampling process,
+* which has kfd_process->mutex protected here.
+*/
+   if (!pcs_entry ||
+   pcs_entry->pdd != pdd)
+   return -EINVAL;
+   }
+
switch (args->op) {
case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES:
return kfd_pc_sample_query_cap(pdd, args);
@@ -186,13 +204,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
return kfd_pc_sample_create(pdd, args);
 
case KFD_IOCTL_PCS_OP_DESTROY:
-   return kfd_pc_sample_destroy(pdd, args->trace_id);
+   if (pcs_entry->enabled)
+   return -EBUSY;
+   else
+   return kfd_pc_sample_destroy(pdd, args->trace_id);
 
case KFD_IOCTL_PCS_OP_START:
-   return kfd_pc_sample_start(pdd);
+   if (pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_start(pdd);
 
case KFD_IOCTL_PCS_OP_STOP:
-   return kfd_pc_sample_stop(pdd);
+   if (!pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_stop(pdd);
}
 
return -EINVAL;
-- 
2.25.1



[PATCH v2 04/23] drm/amdkfd: add pc sampling mutex

2023-12-07 Thread James Zhu
Add pc sampling mutex per node, and do init/destroy in node init.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h   |  7 +++
 2 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0a9cf9dfc224..0e24e011f66b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -533,6 +533,16 @@ static void kfd_smi_init(struct kfd_node *dev)
spin_lock_init(>smi_lock);
 }
 
+static void kfd_pc_sampling_init(struct kfd_node *dev)
+{
+   mutex_init(>pcs_data.mutex);
+}
+
+static void kfd_pc_sampling_exit(struct kfd_node *dev)
+{
+   mutex_destroy(>pcs_data.mutex);
+}
+
 static int kfd_init_node(struct kfd_node *node)
 {
int err = -1;
@@ -563,6 +573,7 @@ static int kfd_init_node(struct kfd_node *node)
}
 
kfd_smi_init(node);
+   kfd_pc_sampling_init(node);
 
return 0;
 
@@ -593,6 +604,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned 
int num_nodes)
kfd_topology_remove_device(knode);
if (knode->gws)
amdgpu_amdkfd_free_gws(knode->adev, knode->gws);
+   kfd_pc_sampling_exit(knode);
kfree(knode);
kfd->nodes[i] = NULL;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 99426182bfc6..cbaa1bccd94b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -269,6 +269,11 @@ struct kfd_vmid_info {
 
 struct kfd_dev;
 
+/* Per device PC Sampling data */
+struct kfd_dev_pc_sampling {
+   struct mutex mutex;
+};
+
 struct kfd_node {
unsigned int node_id;
struct amdgpu_device *adev; /* Duplicated here along with keeping
@@ -322,6 +327,8 @@ struct kfd_node {
struct kfd_local_mem_info local_mem_info;
 
struct kfd_dev *kfd;
+
+   struct kfd_dev_pc_sampling pcs_data;
 };
 
 struct kfd_dev {
-- 
2.25.1



[PATCH v2 01/23] drm/amdkfd/kfd_ioctl: add pc sampling support

2023-12-07 Thread James Zhu
From: David Yat Sin 

Add pc sampling support in kfd_ioctl.

The user mode code which uses this new kfd_ioctl is linked to
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
with master branch.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 include/uapi/linux/kfd_ioctl.h | 57 +-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index f0ed68974c54..1bd1347effea 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1446,6 +1446,58 @@ struct kfd_ioctl_dbg_trap_args {
};
 };
 
+/**
+ * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations
+ *
+ * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities
+ * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a 
per-device PC sampler instance
+ * @KFD_IOCTL_PCS_OP_DESTROY:Unregister from a previously 
registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_START:  Process begins taking samples from a 
previously registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_STOP:   Process stops taking samples from a 
previously registered PC sampler instance
+ */
+enum kfd_ioctl_pc_sample_op {
+   KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES,
+   KFD_IOCTL_PCS_OP_CREATE,
+   KFD_IOCTL_PCS_OP_DESTROY,
+   KFD_IOCTL_PCS_OP_START,
+   KFD_IOCTL_PCS_OP_STOP,
+};
+
+/* Values have to be a power of 2*/
+#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001
+
+enum kfd_ioctl_pc_sample_method {
+   KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1,
+   KFD_IOCTL_PCS_METHOD_STOCHASTIC,
+};
+
+enum kfd_ioctl_pc_sample_type {
+   KFD_IOCTL_PCS_TYPE_TIME_US,
+   KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES,
+   KFD_IOCTL_PCS_TYPE_INSTRUCTIONS
+};
+
+struct kfd_pc_sample_info {
+   __u64 interval;  /* [IN] if PCS_TYPE_INTERVAL_US: sample interval 
in us
+ * if PCS_TYPE_CLOCK_CYCLES: sample interval in 
graphics core clk cycles
+ * if PCS_TYPE_INSTRUCTIONS: sample interval in 
instructions issued by
+ * graphics compute units
+ */
+   __u64 interval_min;  /* [OUT] */
+   __u64 interval_max;  /* [OUT] */
+   __u64 flags; /* [OUT] indicate potential restrictions e.g 
FLAG_POWER_OF_2 */
+   __u32 method;/* [IN/OUT] kfd_ioctl_pc_sample_method */
+   __u32 type;  /* [IN/OUT] kfd_ioctl_pc_sample_type */
+};
+
+struct kfd_ioctl_pc_sample_args {
+   __u64 sample_info_ptr;   /* array of kfd_pc_sample_info */
+   __u32 num_sample_info;
+   __u32 op;/* kfd_ioctl_pc_sample_op */
+   __u32 gpu_id;
+   __u32 trace_id;
+};
+
 #define AMDKFD_IOCTL_BASE 'K'
 #define AMDKFD_IO(nr)  _IO(AMDKFD_IOCTL_BASE, nr)
 #define AMDKFD_IOR(nr, type)   _IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -1566,7 +1618,10 @@ struct kfd_ioctl_dbg_trap_args {
 #define AMDKFD_IOC_DBG_TRAP\
AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args)
 
+#define AMDKFD_IOC_PC_SAMPLE   \
+   AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args)
+
 #define AMDKFD_COMMAND_START   0x01
-#define AMDKFD_COMMAND_END 0x27
+#define AMDKFD_COMMAND_END 0x28
 
 #endif
-- 
2.25.1



[PATCH v2 02/23] drm/amdkfd: add pc sampling support

2023-12-07 Thread James Zhu
From: David Yat Sin 

Add pc sampling functions in amdkfd.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/Makefile  |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 +++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 78 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 34 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 13 
 5 files changed, 171 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index a5ae7bcf44eb..790fd028a681 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -57,7 +57,8 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_int_process_v11.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
$(AMDKFD_PATH)/kfd_crat.o \
-   $(AMDKFD_PATH)/kfd_debug.o
+   $(AMDKFD_PATH)/kfd_debug.o \
+   $(AMDKFD_PATH)/kfd_pc_sampling.o
 
 ifneq ($(CONFIG_DEBUG_FS),)
 AMDKFD_FILES += $(AMDKFD_PATH)/kfd_debugfs.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f6d4748c1980..1a3a8ded9c93 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -41,6 +41,7 @@
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
 #include "kfd_svm.h"
+#include "kfd_pc_sampling.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
 #include "amdgpu_dma_buf.h"
@@ -1750,6 +1751,38 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+static int kfd_ioctl_pc_sample(struct file *filep,
+  struct kfd_process *p, void __user *data)
+{
+   struct kfd_ioctl_pc_sample_args *args = data;
+   struct kfd_process_device *pdd;
+   int ret;
+
+   if (sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+   pr_err("PC Sampling does not support sched_policy %i", 
sched_policy);
+   return -EINVAL;
+   }
+
+   mutex_lock(>mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+
+   if (!pdd) {
+   pr_debug("could not find gpu id 0x%x.", args->gpu_id);
+   ret = -EINVAL;
+   } else {
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
+   if (IS_ERR(pdd)) {
+   pr_debug("failed to bind process %p with gpu id 0x%x", 
p, args->gpu_id);
+   ret = -ESRCH;
+   } else {
+   ret = kfd_pc_sample(pdd, args);
+   }
+   }
+   mutex_unlock(>mutex);
+
+   return ret;
+}
+
 static int criu_checkpoint_process(struct kfd_process *p,
 uint8_t __user *user_priv_data,
 uint64_t *priv_offset)
@@ -3224,6 +3257,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_DBG_TRAP,
kfd_ioctl_set_debug_trap, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_PC_SAMPLE,
+   kfd_ioctl_pc_sample, KFD_IOC_FLAG_PERFMON),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
@@ -3300,6 +3336,14 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
}
}
 
+   /* PC Sampling Monitor */
+   if (unlikely(ioctl->flags & KFD_IOC_FLAG_PERFMON)) {
+   if (!capable(CAP_PERFMON) && !capable(CAP_SYS_ADMIN)) {
+   retcode = -EACCES;
+   goto err_i1;
+   }
+   }
+
if (cmd & (IOC_IN | IOC_OUT)) {
if (asize <= sizeof(stack_kdata)) {
kdata = stack_kdata;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
new file mode 100644
index ..a7e78ff42d07
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission

[PATCH v2 00/23] Support Host Trap Sampling for gfx941/gfx942

2023-12-07 Thread James Zhu
PC sampling is a form of software profiling, where the threads of an application
are periodically interrupted and the program counter that the threads are 
currently
attempting to execute is saved out for profiling.

David Yat Sin (4):
  drm/amdkfd/kfd_ioctl: add pc sampling support
  drm/amdkfd: add pc sampling support
  drm/amdkfd: enable pc sampling query
  drm/amdkfd: enable pc sampling create

James Zhu (19):
  drm/amdkfd: add pc sampling mutex
  drm/amdkfd: add trace_id return
  drm/amdkfd: check pcs_enrty valid
  drm/amdkfd: enable pc sampling destroy
  drm/amdkfd: add interface to trigger pc sampling trap
  drm/amdkfd: trigger pc sampling trap for gfx v9
  drm/amdkfd/gfx9: enable host trap
  drm/amdgpu: use trapID 4 for host trap
  drm/amdgpu: add sq host trap status check
  drm/amdkfd: trigger pc sampling trap for arcturus
  drm/amdkfd: trigger pc sampling trap for aldebaran
  drm/amdkfd: use bit operation set debug trap
  drm/amdkfd: add setting trap pc sampling flag
  drm/amdkfd: enable pc sampling stop
  drm/amdkfd: add queue remapping
  drm/amdkfd: enable pc sampling start
  drm/amdkfd: add pc sampling thread to trigger trap
  drm/amdkfd: add pc sampling release when process release
  drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   11 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   14 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   73 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 +
 drivers/gpu/drm/amd/amdkfd/Makefile   |3 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 +
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |   29 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   44 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   14 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   11 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 +
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  372 +++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h  |   35 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   43 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   32 +-
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |2 +
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |5 +
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |6 +
 include/uapi/linux/kfd_ioctl.h|   60 +-
 19 files changed, 1813 insertions(+), 1059 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

-- 
2.25.1



Re: [PATCH 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support

2023-11-27 Thread James Zhu


On 2023-11-27 14:11, Alex Deucher wrote:

On Fri, Nov 3, 2023 at 9:22 AM James Zhu  wrote:

From: David Yat Sin

Add pc sampling support in kfd_ioctl.

Co-developed-by: James Zhu
Signed-off-by: James Zhu
Signed-off-by: David Yat Sin

For any new IOCTL interfaces, please provide a link to the user mode
code branch which uses it in the patch description.

[JZ] will add, Thanks!

Thanks,

Alex


---
  include/uapi/linux/kfd_ioctl.h | 57 +-
  1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index f0ed68974c54..5202e29c9560 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1446,6 +1446,58 @@ struct kfd_ioctl_dbg_trap_args {
 };
  };

+/**
+ * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations
+ *
+ * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities
+ * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a 
per-device PC sampler instance
+ * @KFD_IOCTL_PCS_OP_DESTROY:Unregister from a previously 
registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_START:  Process begins taking samples from a 
previously registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_STOP:   Process stops taking samples from a 
previously registered PC sampler instance
+ */
+enum kfd_ioctl_pc_sample_op {
+   KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES,
+   KFD_IOCTL_PCS_OP_CREATE,
+   KFD_IOCTL_PCS_OP_DESTROY,
+   KFD_IOCTL_PCS_OP_START,
+   KFD_IOCTL_PCS_OP_STOP,
+};
+
+/* Values have to be a power of 2*/
+#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001
+
+enum kfd_ioctl_pc_sample_method {
+   KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1,
+   KFD_IOCTL_PCS_METHOD_STOCHASTIC,
+};
+
+enum kfd_ioctl_pc_sample_type {
+   KFD_IOCTL_PCS_TYPE_TIME_US,
+   KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES,
+   KFD_IOCTL_PCS_TYPE_INSTRUCTIONS
+};
+
+struct kfd_pc_sample_info {
+   __u64 value; /* [IN] if PCS_TYPE_INTERVAL_US: sample interval 
in us
+ * if PCS_TYPE_CLOCK_CYCLES: sample interval in 
graphics core clk cycles
+ * if PCS_TYPE_INSTRUCTIONS: sample interval in 
instructions issued by
+ * graphics compute units
+ */
+   __u64 value_min; /* [OUT] */
+   __u64 value_max; /* [OUT] */
+   __u64 flags; /* [OUT] indicate potential restrictions e.g 
FLAG_POWER_OF_2 */
+   __u32 method;/* [IN/OUT] kfd_ioctl_pc_sample_method */
+   __u32 type;  /* [IN/OUT] kfd_ioctl_pc_sample_type */
+};
+
+struct kfd_ioctl_pc_sample_args {
+   __u64 sample_info_ptr;   /* array of kfd_pc_sample_info */
+   __u32 num_sample_info;
+   __u32 op;/* kfd_ioctl_pc_sample_op */
+   __u32 gpu_id;
+   __u32 trace_id;
+};
+
  #define AMDKFD_IOCTL_BASE 'K'
  #define AMDKFD_IO(nr)  _IO(AMDKFD_IOCTL_BASE, nr)
  #define AMDKFD_IOR(nr, type)   _IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -1566,7 +1618,10 @@ struct kfd_ioctl_dbg_trap_args {
  #define AMDKFD_IOC_DBG_TRAP\
 AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args)

+#define AMDKFD_IOC_PC_SAMPLE   \
+   AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args)
+
  #define AMDKFD_COMMAND_START   0x01
-#define AMDKFD_COMMAND_END 0x27
+#define AMDKFD_COMMAND_END 0x28

  #endif
--
2.25.1


Re: [PATCH 21/24] drm/amdkfd: add queue remapping

2023-11-23 Thread James Zhu



On 2023-11-23 18:01, Felix Kuehling wrote:

On 2023-11-23 17:41, Greathouse, Joseph wrote:

[Public]


-Original Message-
From: Zhu, James 
Sent: Thursday, November 23, 2023 1:49 PM

On 2023-11-23 14:02, Felix Kuehling wrote:

On 2023-11-23 11:25, James Zhu wrote:

On 2023-11-22 17:35, Felix Kuehling wrote:

On 2023-11-03 09:11, James Zhu wrote:

Add queue remapping to force the waves in any running
processes to complete a CWSR trap.

Please add an explanation why this is needed.

[JZ] Even though the profiling-enabled bits is turned off, the CWSR
trap handlers for some kernels with this process may still in running
stage, this will

force the waves in any running processes to complete a CWSR trap, and
make sure pc sampling is completely stopped with this process.   I
will add it later.

It may be confusing to talk specifically about "CWSR trap handler".
There is only one trap handler that is triggered by different events:
CWSR, host trap, s_trap instructions, exceptions, etc. When a new trap
triggers, it serializes with any currently running trap handler in
that wavefront. So it seems that you're using CWSR as a way to ensure
that any host trap has completed: CWSR will wait for previous traps to
finish before trapping again for CWSR, the HWS firmware waits for CWSR
completion and the driver waits for HWS to finish CWSR with a fence on
a HIQ QUERY_STATUS packet. Is that correct?

[JZ] I think your explanation is more detail. Need Joseph to confirm.
Felix, your summary is correct. The reason we are trying to perform a 
queue unmap/map cycle as part of the PC sampling stop is to prevent 
the following:


1. A PC sampling request arrives to Wave X, sending it to 1st-level 
trap handler
2. User thread asks KFD to stop sampling for this process, which 
leads to kfd_pc_sample_stop()
3. kfd_pc_sample_stop() decrements the sampling refcent. If this is 
the last process to stop sampling, it stops any further sampling 
traps from being generated
4. kfd_pc_sample_stop() sets this process's TMA flag to false so 
waves in the 1st-level trap handler know sampling is disabled
 4.1. Wave X may be in 1st-level handler and not yet checked the 
TMA flag. If so, it will exit the 1st-level handler when it sees flag 
is false
 4.2. Wave X may have already passed the 1st-level TMA flag check 
and entered the 2nd-level trap handler to do the PC sample
5. kfd_pc_sample_stop() returns, eventually causing ioctl to return, 
back to user-space
6. Because the stop ioctl has returned, user-land deallocates 
user-space buffer the 2nd level trap handler uses to output sample data
7. Wave X that was in the 2nd-level handler tries to finish its 
sample output and writes to the now-freed location, causing a 
use-after-free


Note that Step 3 does not always stop further traps from arriving -- 
if another process still wants to do sampling, the driver or HW might 
still send traps to every wave on the device after Step 3.
As such, to avoid going into the 2nd-level handler for non-sampled 
processes, all 1st-level handlers must check their TMA flag to see if 
they should allow the sample to flow to the 2nd-level handler.


By removing the queue from the HW after Step 4, we can be sure that 
any existing waves from this process that entered the PC sampling 
2nd-level handler before Step 4 are done.
Any waves that were still in the 1st-level handler at Step 4.1 will 
be filtered by the TMA flag being set to false. CWSR will wait until 
they exit.
Any waves that were already in the 2nd-level handler (4.2) must 
complete before the CWSR save will complete and allow this queue 
removal request to complete.
Any waves that enter the 1st-level trap handler after Step 4 won't go 
into the PC sampling logic in the 2nd-level handler because the TMA 
flag is set to false. CWSR will wait until they exit.


When we then put the queue back on the hardware, any further traps 
that might show up (e.g. because another process is sampling) will 
get filtered by the TMA flag.


So once the queue removal (and thus CWSR save cycle) has completed, 
we can be sure that no other traps to this process will try to use 
its PC sample data buffer, so it's safe to return to user-space and 
let them potentially free that buffer.


I don't know how to summarize this nicely in a comment, but hopefully 
y'all can figure that out. :)


My best summary: We need to ensure that any waves executing the PC 
sampling part of the trap handler are done before kfd_pc_sample_stop 
returns, and that no new waves enter that part of the trap handler 
afterwards. This avoids race conditions that could lead to 
use-after-free. Unmapping and remapping the queues either waits for 
the waves to drain, or preempts them with CWSR, which itself executes 
a trap and waits for previous traps to finish.



[JZ]  Thanks all!



Regards,
  Felix




Thanks,
-Joe


Regards,
   Felix



Signed-off-by: James Zhu 
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_mana

Re: [PATCH 07/24] drm/amdkfd: check pcs_enrty valid

2023-11-23 Thread James Zhu



On 2023-11-23 15:32, Felix Kuehling wrote:


On 2023-11-23 15:18, James Zhu wrote:


On 2023-11-22 17:15, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Check pcs_enrty valid for pc sampling ioctl.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 30 
++--

  1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 4c9fc48e1a6a..36366c8847de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -179,6 +179,21 @@ static int kfd_pc_sample_destroy(struct 
kfd_process_device *pdd, uint32_t trace_

  int kfd_pc_sample(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user *args)
  {
+    struct pc_sampling_entry *pcs_entry;
+
+    if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES &&
+    args->op != KFD_IOCTL_PCS_OP_CREATE) {
+
+    mutex_lock(>dev->pcs_data.mutex);
+    pcs_entry = 
idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,

+    args->trace_id);
+    mutex_unlock(>dev->pcs_data.mutex);


You need to keep holding the lock while the pcs_entry is still used. 
That includes any of the kfd_pc_sample_ functions below. 
Otherwise someone could free it concurrently. It would also simplify 
the ..._ functions, if they didn't have to worry about the 
locking themselves.
[JZ] pcs_entry is only for this pc sampling process, which has 
kfd_process->mutex protected here.


OK. That's not obvious. I'm also wary about depending too much on the 
big process lock. We will need to make that locking more granular 
soon, because it is causing performance issues with multi-threaded 
processes.

[Jz] Let me add some comments on pcs_entry.


Regards,
  Felix




Regards,
  Felix



+
+    if (!pcs_entry ||
+    pcs_entry->pdd != pdd)
+    return -EINVAL;
+    }
+
  switch (args->op) {
  case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES:
  return kfd_pc_sample_query_cap(pdd, args);
@@ -187,13 +202,22 @@ int kfd_pc_sample(struct kfd_process_device 
*pdd,

  return kfd_pc_sample_create(pdd, args);
    case KFD_IOCTL_PCS_OP_DESTROY:
-    return kfd_pc_sample_destroy(pdd, args->trace_id);
+    if (pcs_entry->enabled)
+    return -EBUSY;
+    else
+    return kfd_pc_sample_destroy(pdd, args->trace_id);
    case KFD_IOCTL_PCS_OP_START:
-    return kfd_pc_sample_start(pdd);
+    if (pcs_entry->enabled)
+    return -EALREADY;
+    else
+    return kfd_pc_sample_start(pdd);
    case KFD_IOCTL_PCS_OP_STOP:
-    return kfd_pc_sample_stop(pdd);
+    if (!pcs_entry->enabled)
+    return -EALREADY;
+    else
+    return kfd_pc_sample_stop(pdd);
  }
    return -EINVAL;


Re: [PATCH 18/24] drm/amdkfd: enable pc sampling start

2023-11-23 Thread James Zhu



On 2023-11-23 15:21, Felix Kuehling wrote:


On 2023-11-23 15:01, James Zhu wrote:


On 2023-11-22 17:27, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Enable pc sampling start.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 
+---

  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  2 ++
  2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 60b29b245db5..33d003ca0093 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -83,9 +83,29 @@ static int kfd_pc_sample_query_cap(struct 
kfd_process_device *pdd,

  return 0;
  }
  -static int kfd_pc_sample_start(struct kfd_process_device *pdd)
+static int kfd_pc_sample_start(struct kfd_process_device *pdd,
+    struct pc_sampling_entry *pcs_entry)
  {
-    return -EINVAL;
+    bool pc_sampling_start = false;
+
+    pcs_entry->enabled = true;
+    mutex_lock(>dev->pcs_data.mutex);
+    if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count)
+    pc_sampling_start = true;
+ pdd->dev->pcs_data.hosttrap_entry.base.active_count++;
+    mutex_unlock(>dev->pcs_data.mutex);
+
+    while (pc_sampling_start) {
+    if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) {

+    usleep_range(1000, 2000);


I don't understand why you need this synchronization through 
stop_enable. Why can't you do both the start and stop while holding 
the mutex? It's just setting a flag in the TMA, so it's not a 
time-consuming operation, and I don't see any potential for deadlocks.
[JZ] for stop, not just set TMA. need wait for current pc sampling 
completely stop and reset some initial setting.


I think that's being obfuscated by how you split up this patch series. 
Maybe if you squash the queue remapping patch into this one, it would 
be more obvious what's really happening when you stop sampling and 
would make it easier to review the synchronization and locking strategy.

[JZ] Sure


Regards,
  Felix




Regards,
  Felix



+    } else {
+ kfd_process_set_trap_pc_sampling_flag(>qpd,
+ pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+    break;
+    }
+    }
+
+    return 0;
  }
    static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
@@ -225,7 +245,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
  if (pcs_entry->enabled)
  return -EALREADY;
  else
-    return kfd_pc_sample_start(pdd);
+    return kfd_pc_sample_start(pdd, pcs_entry);
    case KFD_IOCTL_PCS_OP_STOP:
  if (!pcs_entry->enabled)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 6670534f47b8..613910e0d440 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -258,6 +258,8 @@ struct kfd_dev;
    struct kfd_dev_pc_sampling_data {
  uint32_t use_count; /* Num of PC sampling sessions */
+    uint32_t active_count;  /* Num of active sessions */
+    bool stop_enable;   /* pc sampling stop in process */
  struct idr pc_sampling_idr;
  struct kfd_pc_sample_info pc_sample_info;
  };


Re: [PATCH 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support

2023-11-23 Thread James Zhu



On 2023-11-22 16:14, Felix Kuehling wrote:

On 2023-11-03 09:11, James Zhu wrote:

From: David Yat Sin 

Add pc sampling support in kfd_ioctl.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
  include/uapi/linux/kfd_ioctl.h | 57 +-
  1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h 
b/include/uapi/linux/kfd_ioctl.h

index f0ed68974c54..5202e29c9560 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1446,6 +1446,58 @@ struct kfd_ioctl_dbg_trap_args {
  };
  };
  +/**
+ * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations
+ *
+ * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling 
capabilities
+ * @KFD_IOCTL_PCS_OP_CREATE: Register this process with 
a per-device PC sampler instance
+ * @KFD_IOCTL_PCS_OP_DESTROY:    Unregister from a 
previously registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_START:  Process begins taking 
samples from a previously registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_STOP:   Process stops taking 
samples from a previously registered PC sampler instance

+ */
+enum kfd_ioctl_pc_sample_op {
+    KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES,
+    KFD_IOCTL_PCS_OP_CREATE,
+    KFD_IOCTL_PCS_OP_DESTROY,
+    KFD_IOCTL_PCS_OP_START,
+    KFD_IOCTL_PCS_OP_STOP,
+};
+
+/* Values have to be a power of 2*/
+#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001
+
+enum kfd_ioctl_pc_sample_method {
+    KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1,
+    KFD_IOCTL_PCS_METHOD_STOCHASTIC,
+};
+
+enum kfd_ioctl_pc_sample_type {
+    KFD_IOCTL_PCS_TYPE_TIME_US,
+    KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES,
+    KFD_IOCTL_PCS_TYPE_INSTRUCTIONS
+};
+
+struct kfd_pc_sample_info {
+    __u64 value; /* [IN] if PCS_TYPE_INTERVAL_US: sample 
interval in us
+  * if PCS_TYPE_CLOCK_CYCLES: sample 
interval in graphics core clk cycles
+  * if PCS_TYPE_INSTRUCTIONS: sample 
interval in instructions issued by

+  * graphics compute units


I'd call this "interval". That's still generic enough to be a sampling 
interval in a unit that depends on the PCS type. "value" is 
misleading, because it sounds like it may be an actual sample.

[JZ] I am fine this interface name changes,




+  */
+    __u64 value_min; /* [OUT] */
+    __u64 value_max; /* [OUT] */


interval_min/max.

Regards,
  Felix


+    __u64 flags; /* [OUT] indicate potential restrictions 
e.g FLAG_POWER_OF_2 */

+    __u32 method;    /* [IN/OUT] kfd_ioctl_pc_sample_method */
+    __u32 type;  /* [IN/OUT] kfd_ioctl_pc_sample_type */
+};
+
+struct kfd_ioctl_pc_sample_args {
+    __u64 sample_info_ptr;   /* array of kfd_pc_sample_info */
+    __u32 num_sample_info;
+    __u32 op;    /* kfd_ioctl_pc_sample_op */
+    __u32 gpu_id;
+    __u32 trace_id;
+};
+
  #define AMDKFD_IOCTL_BASE 'K'
  #define AMDKFD_IO(nr)    _IO(AMDKFD_IOCTL_BASE, nr)
  #define AMDKFD_IOR(nr, type)    _IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -1566,7 +1618,10 @@ struct kfd_ioctl_dbg_trap_args {
  #define AMDKFD_IOC_DBG_TRAP    \
  AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args)
  +#define AMDKFD_IOC_PC_SAMPLE    \
+    AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args)
+
  #define AMDKFD_COMMAND_START    0x01
-#define AMDKFD_COMMAND_END    0x27
+#define AMDKFD_COMMAND_END    0x28
    #endif


Re: [PATCH 05/24] drm/amdkfd: enable pc sampling create

2023-11-23 Thread James Zhu



On 2023-11-22 16:51, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

From: David Yat Sin 

Enable pc sampling create.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 10 
  2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 49fecbc7013e..f0d910ee730c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -97,7 +97,59 @@ static int kfd_pc_sample_stop(struct 
kfd_process_device *pdd)

  static int kfd_pc_sample_create(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user *user_args)
  {
-    return -EINVAL;
+    struct kfd_pc_sample_info *supported_format = NULL;
+    struct kfd_pc_sample_info user_info;
+    int ret;
+    int i;
+
+    if (user_args->num_sample_info != 1)
+    return -EINVAL;
+
+    ret = copy_from_user(_info, (void __user *) 
user_args->sample_info_ptr,

+    sizeof(struct kfd_pc_sample_info));
+    if (ret) {
+    pr_debug("Failed to copy PC sampling info from user\n");
+    return -EFAULT;
+    }
+
+    for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+    if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version
+    && user_info.method == 
supported_formats[i].sample_info->method

+    && user_info.type == supported_formats[i].sample_info->type
+    && user_info.value <= 
supported_formats[i].sample_info->value_max
+    && user_info.value >= 
supported_formats[i].sample_info->value_min) {

+    supported_format =
+    (struct kfd_pc_sample_info 
*)supported_formats[i].sample_info;

+    break;
+    }
+    }
+
+    if (!supported_format) {
+    pr_debug("Sampling format is not supported!");
+    return -EOPNOTSUPP;
+    }
+
+    mutex_lock(>dev->pcs_data.mutex);
+    if (pdd->dev->pcs_data.hosttrap_entry.base.use_count &&
+ memcmp(>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+    _info, sizeof(user_info))) {


I think you can compare structures in C. This would be more readable:

if (pdd->dev->pcs_data.hosttrap_entry.base.use_count &&
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info != user_info) {
    ...
}
[JZ[ Sure


+    ret = copy_to_user((void __user *) user_args->sample_info_ptr,
+ >dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+    sizeof(struct kfd_pc_sample_info));
+    mutex_unlock(>dev->pcs_data.mutex);
+    return ret ? ret : -EEXIST;


When copy_to_user fails, it returns the number of bytes not copied. 
That's not a useful return value here. This should be


    return ret ? -EFAULT : -EEXIST;

Also -EBUSY may be more appropriate than -EEXIST.

[JZ[ Sure




+    }
+
+    /* TODO: add trace_id return */
+
+    if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+ memcpy(>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+    _info, sizeof(user_info));


I think you can assign structures in C. Just do

pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = user_info;
[JZ[ Sure
Regards,
  Felix



+
+    pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
+    mutex_unlock(>dev->pcs_data.mutex);
+
+    return 0;
  }
    static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, 
uint32_t trace_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 4a0b66189c67..81c925fb2952 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -256,9 +256,19 @@ struct kfd_vmid_info {
    struct kfd_dev;
  +struct kfd_dev_pc_sampling_data {
+    uint32_t use_count; /* Num of PC sampling sessions */
+    struct kfd_pc_sample_info pc_sample_info;
+};
+
+struct kfd_dev_pcs_hosttrap {
+    struct kfd_dev_pc_sampling_data base;
+};
+
  /* Per device PC Sampling data */
  struct kfd_dev_pc_sampling {
  struct mutex mutex;
+    struct kfd_dev_pcs_hosttrap hosttrap_entry;
  };
    struct kfd_node {


Re: [PATCH 06/24] drm/amdkfd: add trace_id return

2023-11-23 Thread James Zhu



On 2023-11-22 16:56, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Add trace_id return for new pc sampling creation per device,
Use IDR to quickly locate pc_sampling_entry for reference.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  2 ++
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  6 ++
  3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c

index 0e24e011f66b..bcaeedac8fe0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev)
  static void kfd_pc_sampling_init(struct kfd_node *dev)
  {
  mutex_init(>pcs_data.mutex);
+ idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
  }
    static void kfd_pc_sampling_exit(struct kfd_node *dev)
  {
+ idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr);
  mutex_destroy(>pcs_data.mutex);
  }
  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index f0d910ee730c..4c9fc48e1a6a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -99,6 +99,7 @@ static int kfd_pc_sample_create(struct 
kfd_process_device *pdd,

  {
  struct kfd_pc_sample_info *supported_format = NULL;
  struct kfd_pc_sample_info user_info;
+    struct pc_sampling_entry *pcs_entry;
  int ret;
  int i;
  @@ -140,7 +141,19 @@ static int kfd_pc_sample_create(struct 
kfd_process_device *pdd,

  return ret ? ret : -EEXIST;
  }
  -    /* TODO: add trace_id return */
+    pcs_entry = kvzalloc(sizeof(*pcs_entry), GFP_KERNEL);


I don't see a reason to use kvzalloc here. You know the size of the 
structure, so kzalloc should be perfectly fine.

[JZ] Sure, will change to kzalloc




+    if (!pcs_entry) {
+    mutex_unlock(>dev->pcs_data.mutex);
+    return -ENOMEM;
+    }
+
+    i = 
idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,

+    pcs_entry, 1, 0, GFP_KERNEL);
+    if (i < 0) {
+    mutex_unlock(>dev->pcs_data.mutex);
+    kvfree(pcs_entry);


kfree



+    return i;
+    }
    if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
memcpy(>dev->pcs_data.hosttrap_entry.base.pc_sample_info,
@@ -149,6 +162,11 @@ static int kfd_pc_sample_create(struct 
kfd_process_device *pdd,

  pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
  mutex_unlock(>dev->pcs_data.mutex);
  +    pcs_entry->pdd = pdd;
+    user_args->trace_id = (uint32_t)i;


I suspect this should be done inside the lock. You don't want someone 
looking up the pcs_entry before it has been initialized.
[JZ]pcs_entry is for this pc sampling process, and it has 
kfd_process->mutex protected,


Regards,
  Felix



+
+    pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", 
pcs_entry, i, pdd->dev->id);

+
  return 0;
  }
  diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 81c925fb2952..642558026d16 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -258,6 +258,7 @@ struct kfd_dev;
    struct kfd_dev_pc_sampling_data {
  uint32_t use_count; /* Num of PC sampling sessions */
+    struct idr pc_sampling_idr;
  struct kfd_pc_sample_info pc_sample_info;
  };
  @@ -743,6 +744,11 @@ enum kfd_pdd_bound {
   */
  #define SDMA_ACTIVITY_DIVISOR  100
  +struct pc_sampling_entry {
+    bool enabled;
+    struct kfd_process_device *pdd;
+};
+
  /* Data that is per-process-per device. */
  struct kfd_process_device {
  /* The device that owns this data. */


Re: [PATCH 07/24] drm/amdkfd: check pcs_enrty valid

2023-11-23 Thread James Zhu



On 2023-11-22 17:15, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Check pcs_enrty valid for pc sampling ioctl.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 30 ++--
  1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 4c9fc48e1a6a..36366c8847de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -179,6 +179,21 @@ static int kfd_pc_sample_destroy(struct 
kfd_process_device *pdd, uint32_t trace_

  int kfd_pc_sample(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user *args)
  {
+    struct pc_sampling_entry *pcs_entry;
+
+    if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES &&
+    args->op != KFD_IOCTL_PCS_OP_CREATE) {
+
+    mutex_lock(>dev->pcs_data.mutex);
+    pcs_entry = 
idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,

+    args->trace_id);
+    mutex_unlock(>dev->pcs_data.mutex);


You need to keep holding the lock while the pcs_entry is still used. 
That includes any of the kfd_pc_sample_ functions below. Otherwise 
someone could free it concurrently. It would also simplify the 
..._ functions, if they didn't have to worry about the locking 
themselves.
[JZ] pcs_entry is only for this pc sampling process, which has 
kfd_process->mutex protected here.


Regards,
  Felix



+
+    if (!pcs_entry ||
+    pcs_entry->pdd != pdd)
+    return -EINVAL;
+    }
+
  switch (args->op) {
  case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES:
  return kfd_pc_sample_query_cap(pdd, args);
@@ -187,13 +202,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
  return kfd_pc_sample_create(pdd, args);
    case KFD_IOCTL_PCS_OP_DESTROY:
-    return kfd_pc_sample_destroy(pdd, args->trace_id);
+    if (pcs_entry->enabled)
+    return -EBUSY;
+    else
+    return kfd_pc_sample_destroy(pdd, args->trace_id);
    case KFD_IOCTL_PCS_OP_START:
-    return kfd_pc_sample_start(pdd);
+    if (pcs_entry->enabled)
+    return -EALREADY;
+    else
+    return kfd_pc_sample_start(pdd);
    case KFD_IOCTL_PCS_OP_STOP:
-    return kfd_pc_sample_stop(pdd);
+    if (!pcs_entry->enabled)
+    return -EALREADY;
+    else
+    return kfd_pc_sample_stop(pdd);
  }
    return -EINVAL;


Re: [PATCH 18/24] drm/amdkfd: enable pc sampling start

2023-11-23 Thread James Zhu



On 2023-11-22 17:27, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Enable pc sampling start.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +---
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  2 ++
  2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 60b29b245db5..33d003ca0093 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -83,9 +83,29 @@ static int kfd_pc_sample_query_cap(struct 
kfd_process_device *pdd,

  return 0;
  }
  -static int kfd_pc_sample_start(struct kfd_process_device *pdd)
+static int kfd_pc_sample_start(struct kfd_process_device *pdd,
+    struct pc_sampling_entry *pcs_entry)
  {
-    return -EINVAL;
+    bool pc_sampling_start = false;
+
+    pcs_entry->enabled = true;
+    mutex_lock(>dev->pcs_data.mutex);
+    if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count)
+    pc_sampling_start = true;
+ pdd->dev->pcs_data.hosttrap_entry.base.active_count++;
+    mutex_unlock(>dev->pcs_data.mutex);
+
+    while (pc_sampling_start) {
+    if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) {

+    usleep_range(1000, 2000);


I don't understand why you need this synchronization through 
stop_enable. Why can't you do both the start and stop while holding 
the mutex? It's just setting a flag in the TMA, so it's not a 
time-consuming operation, and I don't see any potential for deadlocks.
[JZ] for stop, not just set TMA. need wait for current pc sampling 
completely stop and reset some initial setting.


Regards,
  Felix



+    } else {
+ kfd_process_set_trap_pc_sampling_flag(>qpd,
+ pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+    break;
+    }
+    }
+
+    return 0;
  }
    static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
@@ -225,7 +245,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
  if (pcs_entry->enabled)
  return -EALREADY;
  else
-    return kfd_pc_sample_start(pdd);
+    return kfd_pc_sample_start(pdd, pcs_entry);
    case KFD_IOCTL_PCS_OP_STOP:
  if (!pcs_entry->enabled)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

index 6670534f47b8..613910e0d440 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -258,6 +258,8 @@ struct kfd_dev;
    struct kfd_dev_pc_sampling_data {
  uint32_t use_count; /* Num of PC sampling sessions */
+    uint32_t active_count;  /* Num of active sessions */
+    bool stop_enable;   /* pc sampling stop in process */
  struct idr pc_sampling_idr;
  struct kfd_pc_sample_info pc_sample_info;
  };


Re: [PATCH 20/24] drm/amdkfd: enable pc sampling work to trigger trap

2023-11-23 Thread James Zhu



On 2023-11-23 14:08, Felix Kuehling wrote:

On 2023-11-23 13:27, James Zhu wrote:


On 2023-11-22 17:31, Felix Kuehling wrote:


On 2023-11-03 09:11, James Zhu wrote:

Enable a delay work to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
  drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  3 ++
  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 39 


  drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |  1 +
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  1 +
  4 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c

index bcaeedac8fe0..fb21902e433a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -35,6 +35,7 @@
  #include "kfd_migrate.h"
  #include "amdgpu.h"
  #include "amdgpu_xcp.h"
+#include "kfd_pc_sampling.h"
    #define MQD_SIZE_ALIGNED 768
  @@ -537,6 +538,8 @@ static void kfd_pc_sampling_init(struct 
kfd_node *dev)

  {
  mutex_init(>pcs_data.mutex);
idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
+ INIT_WORK(>pcs_data.hosttrap_entry.base.pc_sampling_work,
+    kfd_pc_sample_handler);
  }
    static void kfd_pc_sampling_exit(struct kfd_node *dev)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c

index 2c4ac5b4cc4b..e8f0559b618e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -38,6 +38,43 @@ struct supported_pc_sample_info 
supported_formats[] = {

  { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 },
  };
  +void kfd_pc_sample_handler(struct work_struct *work)
+{
+    struct amdgpu_device *adev;
+    struct kfd_node *node;
+    uint32_t timeout = 0;
+
+    node = container_of(work, struct kfd_node,
+ pcs_data.hosttrap_entry.base.pc_sampling_work);
+
+    mutex_lock(>pcs_data.mutex);
+    if (node->pcs_data.hosttrap_entry.base.active_count &&
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.value &&
+    node->kfd2kgd->trigger_pc_sample_trap) {
+    switch 
(node->pcs_data.hosttrap_entry.base.pc_sample_info.type) {

+    case KFD_IOCTL_PCS_TYPE_TIME_US:
+    timeout = 
(uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.value;

+    break;
+    default:
+    pr_debug("PC Sampling type %d not supported.",
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.type);
+    }
+    }
+    mutex_unlock(>pcs_data.mutex);
+    if (!timeout)
+    return;
+
+    adev = node->adev;
+    while 
(!READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable)) {


This worker basically runs indefinitely (controlled by user mode).

+ node->kfd2kgd->trigger_pc_sample_trap(adev, 
node->vm_info.last_vmid_kfd,

+ >pcs_data.hosttrap_entry.base.target_simd,
+ >pcs_data.hosttrap_entry.base.target_wave_slot,
+ node->pcs_data.hosttrap_entry.base.pc_sample_info.method);
+    pr_debug_ratelimited("triggered a host trap.");
+    usleep_range(timeout, timeout + 10);


This will cause drift of the interval. Instead what you should do, 
is calculate the wait time at the end of every iteration based on 
the current time and the interval.
[JZ] I am wondering what degree of accuracy is requested  on 
interval, there is HW time stamp with each pc sampling data packet,




+    }
+}
+
  static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
  struct kfd_ioctl_pc_sample_args __user 
*user_args)

  {
@@ -101,6 +138,7 @@ static int kfd_pc_sample_start(struct 
kfd_process_device *pdd,

  } else {
kfd_process_set_trap_pc_sampling_flag(>qpd,
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+ 
schedule_work(>dev->pcs_data.hosttrap_entry.base.pc_sampling_work);


Scheduling a worker that runs indefinitely on the system workqueue 
is probably a bad idea. It could block other work items 
indefinitely. I think you are misusing the work queue API here. What 
you really want is probably, to crease a kernel thread.
[JZ] Yes, you are right. How about use  alloc_workqueue to create 
queue instead of system queue, is alloc_workqueue more efficient than 
kernel thread creation?


A work queue can create many kernel threads to handle the execution of 
work items. You really only need a single kernel thread per GPU for 
time-based PC sampling. IMO the work queue just adds a bunch of 
overhead. Using a work queue for something that runs indefinitely 
feels like an abuse of the API. I don't have much experience with 
creating kernel threads directly. See include/linux/kthread.h. If you 
want to look for an example, it seems drivers/gpu/drm/scheduler uses 
the kthread API.

[JZ] then let me switch to kthread


Regards,
  Felix




Regards,
  Felix



  break;
  }
  }
@@ 

  1   2   3   4   5   6   >