Re: [PATCH] drm/amdkfd: Remove arbitrary timeout for hmm_range_fault
On 2024-05-01 18:56, Philip Yang wrote: On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary timeout value is not accurate, cause memory allocation failure. Remove the arbitrary timeout value, return EAGAIN to application if hmm_range_fault return EBUSY, then userspace libdrm and Thunk will call ioctl again. Change EAGAIN to debug message as this is not error. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 - drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 12 +++- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 + 3 files changed, 8 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 54198c3928c7..02696c2102f1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1087,7 +1087,10 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr, ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages, ); if (ret) { - pr_err("%s: Failed to get user pages: %d\n", __func__, ret); + if (ret == -EAGAIN) + pr_debug("Failed to get user pages, try again\n"); + else + pr_err("%s: Failed to get user pages: %d\n", __func__, ret); goto unregister_out; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index 431ec72655ec..e36fede7f74c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -202,20 +202,12 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, pr_debug("hmm range: start = 0x%lx, end = 0x%lx", hmm_range->start, hmm_range->end); - /* Assuming 64MB takes maximum 1 second to fault page address */ - timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL); - timeout *= HMM_RANGE_DEFAULT_TIMEOUT; - timeout = jiffies + msecs_to_jiffies(timeout); + timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); [JZ] should we reduce MAX_WALK_BYTE to 64M in the meantime? retry: hmm_range->notifier_seq = mmu_interval_read_begin(notifier); r = hmm_range_fault(hmm_range); if (unlikely(r)) { - schedule(); [JZ] the above is for CPU stall WA, we may still need keep it. - /* -* FIXME: This timeout should encompass the retry from -* mmu_interval_read_retry() as well. -*/ if (r == -EBUSY && !time_after(jiffies, timeout)) goto retry; goto out_free_pfns; @@ -247,6 +239,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, out_free_range: kfree(hmm_range); + if (r == -EBUSY) + r = -EAGAIN; return r; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 94f83be2232d..e7040f809f33 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1670,11 +1670,8 @@ static int svm_range_validate_and_map(struct mm_struct *mm, readonly, owner, NULL, _range); WRITE_ONCE(p->svms.faulting_task, NULL); - if (r) { + if (r) pr_debug("failed %d to get svm range pages\n", r); - if (r == -EBUSY) - r = -EAGAIN; - } } else { r = -EFAULT; }
Re: [PATCH] drm/amd/amdxcp: Use unique name for partition dev
On 2024-04-30 07:36, Lijo Lazar wrote: amdxcp is a platform driver for creating partition devices. libdrm library identifies a platform device based on 'OF_FULLNAME' or 'MODALIAS'. If two or more devices have the same platform name, drm library only picks the first device. Platform driver core uses name of the device to populate 'MODALIAS'. When 'amdxcp' is used as the base name, only first partition device gets identified. Assign unique name so that drm library identifies partition devices separately. amdxcp doesn't support probe of partitions, it doesn't bother about modaliases. Signed-off-by: Lijo Lazar Acked-by:JamesZhu --- drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c b/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c index 90ddd8371176..b4131053b31b 100644 --- a/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c +++ b/drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c @@ -50,12 +50,14 @@ int amdgpu_xcp_drm_dev_alloc(struct drm_device **ddev) { struct platform_device *pdev; struct xcp_device *pxcp_dev; + char dev_name[20]; int ret; if (pdev_num >= MAX_XCP_PLATFORM_DEVICE) return -ENODEV; - pdev = platform_device_register_simple("amdgpu_xcp", pdev_num, NULL, 0); + snprintf(dev_name, sizeof(dev_name), "amdgpu_xcp_%d", pdev_num); + pdev = platform_device_register_simple(dev_name, -1, NULL, 0); if (IS_ERR(pdev)) return PTR_ERR(pdev);
Re: [PATCH v4 00/24] Support Host Trap Sampling for gfx941/gfx942
Ping . Best Regards! James Zhu On 2024-02-06 10:58, James Zhu wrote: PC sampling is a form of software profiling, where the threads of an application are periodically interrupted and the program counter that the threads are currently attempting to execute is saved out for profiling. David Yat Sin (5): drm/amdkfd/kfd_ioctl: add pc sampling support drm/amdkfd: add pc sampling support drm/amdkfd: enable pc sampling query drm/amdkfd: enable pc sampling create drm/amdkfd: Set debug trap bit when enabling PC Sampling James Zhu (19): drm/amdkfd: add pc sampling mutex drm/amdkfd: add trace_id return drm/amdkfd: check pcs_entry valid drm/amdkfd: enable pc sampling destroy drm/amdkfd: add interface to trigger pc sampling trap drm/amdkfd: trigger pc sampling trap for gfx v9 drm/amdkfd/gfx9: enable host trap drm/amdgpu: use trapID 4 for host trap drm/amdgpu: add sq host trap status check drm/amdkfd: trigger pc sampling trap for arcturus drm/amdkfd: trigger pc sampling trap for aldebaran drm/amdkfd: use bit operation set debug trap drm/amdkfd: add setting trap pc sampling flag drm/amdkfd: enable pc sampling stop drm/amdkfd: add queue remapping drm/amdkfd: enable pc sampling start drm/amdkfd: add pc sampling thread to trigger trap drm/amdkfd: add pc sampling release when process release drm/amdkfd: bump kfd ioctl minor version for pc sampling availability .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 14 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 73 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 + drivers/gpu/drm/amd/amdkfd/Makefile |3 +- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 + .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 29 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 75 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 26 + drivers/gpu/drm/amd/amdkfd/kfd_debug.h|3 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 14 + .../drm/amd/amdkfd/kfd_device_queue_manager.c | 11 + .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 + drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 426 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 35 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 46 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 32 +- .../amd/include/asic_reg/gc/gc_9_0_offset.h |2 + .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h |5 + .../gpu/drm/amd/include/kgd_kfd_interface.h |7 + include/uapi/linux/kfd_ioctl.h| 64 +- 21 files changed, 1914 insertions(+), 1080 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
[PATCH v4 12/24] drm/amdgpu: use trapID 4 for host trap
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify the host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 + .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 + .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |5 + 3 files changed, 1070 insertions(+), 1054 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index 7d8c0e13ac12..adfe5e5585e5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1162,6 +1162,8 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd); /* select *target_wave_slot */ value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, (*target_wave_slot)++); + /* set TrapID 4 for HOSTTRAP */ + value = REG_SET_FIELD(value, SQ_CMD, DATA, 0x4); mutex_lock(>grbm_idx_mutex); amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index af1f678790e7..b3c681d7256b 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -274,155 +274,263 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { static const uint32_t cwsr_trap_gfx9_hex[] = { - 0xbf820001, 0xbf82025e, + 0xbf820001, 0xbf820263, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, - 0x00ff, 0xbf85001e, + 0x00ff, 0xbf850023, 0x866eff7b, 0x0400, - 0xbf85005b, 0xbf8e0010, + 0xbf850060, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, - 0xbf850015, 0x866eff7b, - 0x71ff, 0xbf840008, - 0x866fff7b, 0x7080, - 0xbf840001, 0xbeee1a87, - 0xb8eff801, 0x8e6e8c6e, - 0x866e6f6e, 0xbf85000a, - 0x866eff6d, 0x00ff, - 0xbf850007, 0xb8eef801, - 0x866eff6e, 0x0800, - 0xbf850003, 0x866eff7b, - 0x0400, 0xbf850040, - 0xb8faf807, 0x867aff7a, - 0x001f8000, 0x8e7a8b7a, - 0x8977ff77, 0xfc00, - 0x8a77, 0xba7ff807, - 0x, 0xb8faf812, - 0xb8fbf813, 0x8efa887a, - 0xbf0d8f7b, 0xbf840002, - 0x877bff7b, 0x, - 0xc0031c3d, 0x0010, - 0xc0071bbd, 0x, - 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x8671ff6d, - 0x0100, 0xbf840004, - 0x92f1ff70, 0x00010001, - 0xbf840016, 0xbf820005, - 0x86708170, 0x8e709770, - 0x8977ff77, 0x0080, - 0x8077, 0x86ee6e6e, - 0xbf840001, 0xbe801d6e, - 0x866eff6d, 0x01ff, - 0xbf850005, 0x8778ff78, - 0x2000, 0x80ec886c, - 0x82ed806d, 0xbf820005, - 0x866eff6d, 0x0100, - 0xbf850002, 0x806c846c, - 0x826d806d, 0x866dff6d, - 0x, 0x8f7a8b77, + 0xbf85001a, 0x866eff6d, + 0x01ff, 0xbf06ff6e, + 0x0104, 0xbf850015, + 0x866eff7b, 0x71ff, + 0xbf840008, 0x866fff7b, + 0x7080, 0xbf840001, + 0xbeee1a87, 0xb8eff801, + 0x8e6e8c6e, 0x866e6f6e, + 0xbf85000a, 0x866eff6d, + 0x00ff, 0xbf850007, + 0xb8eef801, 0x866eff6e, + 0x0800, 0xbf850003, + 0x866eff7b, 0x0400, + 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, - 0xb97af807, 0x86fe7e7e, - 0x86ea6a6a, 0x8f6e8378, - 0xb96ee0c2, 0xbf82, - 0xb9780002, 0xbe801f6c, + 0x8e7a8b7a, 0x8977ff77, + 0xfc00, 0x8a77, + 0xba7ff807, 0x, + 0xb8faf812, 0xb8fbf813, + 0x8efa887a, 0xbf0d8f7b, + 0xbf840002, 0x877bff7b, + 0x, 0xc0031c3d, + 0x0010, 0xc0071bbd, + 0x, 0xc0071ebd, + 0x0008, 0xbf8cc07f, + 0x8671ff6d, 0x0100, + 0xbf840004, 0x92f1ff70, + 0x00010001, 0xbf840016, + 0xbf820005, 0x86708170, + 0x8e709770, 0x8977ff77, + 0x0080, 0x8077, + 0x86ee6e6e, 0xbf840001, + 0xbe801d6e, 0x866eff6d, + 0x01ff, 0xbf850005, + 0x8778ff78, 0x2000, + 0x80ec886c, 0x82ed806d, + 0xbf820005, 0x866eff6d, + 0x0100, 0xbf850002, + 0x806c846c, 0x826d806d, 0x866dff6d, 0x, - 0xbefa0080, 0xb97a0283, - 0xb8faf807, 0x867aff7a, - 0x001f8000, 0x8e7a8b7a, - 0x8977ff77, 0xfc00, - 0x8a77, 0xba7ff807, - 0x, 0xbeee007e, - 0xbeef007f, 0xbefe0180, - 0xbf94, 0x877a8478, - 0xb97af802, 0xbf8e0002, - 0xbf88fffe, 0xb8fa2a05, - 0x807a817a, 0x8e7a8
[PATCH v4 21/24] drm/amdkfd: add pc sampling thread to trigger trap
Add a kthread to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 91 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 1 + 2 files changed, 89 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 6f50ba1f8989..ea9478c3738a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -39,6 +39,84 @@ struct supported_pc_sample_info supported_formats[] = { { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 }, }; +static int kfd_pc_sample_thread(void *param) +{ + struct amdgpu_device *adev; + struct kfd_node *node = param; + uint32_t timeout = 0; + ktime_t next_trap_time; + + mutex_lock(>pcs_data.mutex); + if (node->pcs_data.hosttrap_entry.base.active_count && + node->pcs_data.hosttrap_entry.base.pc_sample_info.interval && + node->kfd2kgd->trigger_pc_sample_trap) { + switch (node->pcs_data.hosttrap_entry.base.pc_sample_info.type) { + case KFD_IOCTL_PCS_TYPE_TIME_US: + timeout = (uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.interval; + break; + default: + pr_debug("PC Sampling type %d not supported.", + node->pcs_data.hosttrap_entry.base.pc_sample_info.type); + } + } + mutex_unlock(>pcs_data.mutex); + if (!timeout) + return -EINVAL; + + adev = node->adev; + + allow_signal(SIGKILL); + while (!kthread_should_stop() && + !READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable) && + !signal_pending(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) { + next_trap_time = ktime_add_us(ktime_get_raw(), timeout); + + node->kfd2kgd->trigger_pc_sample_trap(adev, node->vm_info.last_vmid_kfd, + >pcs_data.hosttrap_entry.base.target_simd, + >pcs_data.hosttrap_entry.base.target_wave_slot, + node->pcs_data.hosttrap_entry.base.pc_sample_info.method); + pr_debug_ratelimited("triggered a host trap."); + + might_sleep(); + do { + ktime_t wait_time; + s64 wait_ns, wait_us; + + wait_time = ktime_sub(next_trap_time, ktime_get_raw()); + wait_ns = ktime_to_ns(wait_time); + wait_us = ktime_to_us(wait_time); + if (wait_ns >= 1) + usleep_range(wait_us - 10, wait_us); + else if (wait_ns > 0) + schedule(); + else + break; + } while (1); + } + node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL; + + return 0; +} + +static int kfd_pc_sample_thread_start(struct kfd_node *node) +{ + char thread_name[16]; + int ret = 0; + + snprintf(thread_name, 16, "pcs_%08x", node->adev->ddev.render->index); + node->pcs_data.hosttrap_entry.base.pc_sample_thread = + kthread_run(kfd_pc_sample_thread, node, thread_name); + + if (IS_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) { + ret = PTR_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread); + node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL; + pr_debug("Failed to create pc sample thread for %s with ret = %d.", + thread_name, ret); + } + + return ret; +} + static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { @@ -99,6 +177,7 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd, struct pc_sampling_entry *pcs_entry) { bool pc_sampling_start = false; + int ret = 0; pcs_entry->enabled = true; mutex_lock(>dev->pcs_data.mutex); @@ -112,13 +191,16 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd, mutex_unlock(>dev->pcs_data.mutex); while (pc_sampling_start) { - if (READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) + if (READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) { usleep_range(1000, 2000); - else + } else { +
[PATCH v4 22/24] drm/amdkfd: add pc sampling release when process release
Add pc sampling release when process release, it will force to stop all activate sessions with this process. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 25 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 1 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 +++ 3 files changed, 29 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index ea9478c3738a..783844ddd82f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -337,6 +337,31 @@ static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_ return 0; } +void kfd_pc_sample_release(struct kfd_process_device *pdd) +{ + struct pc_sampling_entry *pcs_entry; + struct idr *idp; + uint32_t id; + + /* force to release all PC sampling task for this process */ + idp = >dev->pcs_data.hosttrap_entry.base.pc_sampling_idr; + do { + pcs_entry = NULL; + mutex_lock(>dev->pcs_data.mutex); + idr_for_each_entry(idp, pcs_entry, id) { + if (pcs_entry->pdd != pdd) + continue; + break; + } + mutex_unlock(>dev->pcs_data.mutex); + if (pcs_entry) { + if (pcs_entry->enabled) + kfd_pc_sample_stop(pdd, pcs_entry); + kfd_pc_sample_destroy(pdd, id, pcs_entry); + } + } while (pcs_entry); +} + int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args) { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h index 4eeded4ea5b6..6175563ca9be 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h @@ -30,5 +30,6 @@ int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args); +void kfd_pc_sample_release(struct kfd_process_device *pdd); #endif /* KFD_PC_SAMPLING_H_ */ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 4a450abf9fa9..bbad0b0848df 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -43,6 +43,7 @@ struct mm_struct; #include "kfd_svm.h" #include "kfd_smi_events.h" #include "kfd_debug.h" +#include "kfd_pc_sampling.h" /* * List of struct kfd_process (field kfd_process). @@ -1021,6 +1022,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p) pr_debug("Releasing pdd (topology id %d) for process (pasid 0x%x)\n", pdd->dev->id, p->pasid); + kfd_pc_sample_release(pdd); + kfd_process_device_destroy_cwsr_dgpu(pdd); kfd_process_device_destroy_ib_mem(pdd); -- 2.25.1
[PATCH v4 23/24] drm/amdkfd: Set debug trap bit when enabling PC Sampling
From: David Yat Sin We need the SPI_GDBG_PER_VMID_CNTL.TRAP_EN bit to be set during PC Sampling so that the TTMP registers are valid inside the sampling data. runtime_info.ttmp_setup will be cleared when the user application does the AMDKFD_IOC_RUNTIME_ENABLE ioctl without KFD_RUNTIME_ENABLE_MODE_ENABLE_MASK flag on exit. It is also not valid to have the debugger attached to a process while PC sampling is enabled so adding some checks to prevent this. Signed-off-by: David Yat Sin Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 30 ++-- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 26 + drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 13 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 3 ++ 5 files changed, 54 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index d9cac97c54c0..bc37f3ee2c66 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -2804,26 +2804,9 @@ static int runtime_enable(struct kfd_process *p, uint64_t r_debug, p->runtime_info.runtime_state = DEBUG_RUNTIME_STATE_ENABLED; p->runtime_info.r_debug = r_debug; - p->runtime_info.ttmp_setup = enable_ttmp_setup; - if (p->runtime_info.ttmp_setup) { - for (i = 0; i < p->n_pdds; i++) { - struct kfd_process_device *pdd = p->pdds[i]; - - if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) { - amdgpu_gfx_off_ctrl(pdd->dev->adev, false); - pdd->dev->kfd2kgd->enable_debug_trap( - pdd->dev->adev, - true, - pdd->dev->vm_info.last_vmid_kfd); - } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) { - pdd->spi_dbg_override = pdd->dev->kfd2kgd->enable_debug_trap( - pdd->dev->adev, - false, - 0); - } - } - } + if (enable_ttmp_setup) + kfd_dbg_enable_ttmp_setup(p); retry: if (p->debug_trap_enabled) { @@ -2972,10 +2955,10 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, v goto out; } - /* Check if target is still PTRACED. */ rcu_read_lock(); + /* Check if target is still PTRACED. */ if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE - && ptrace_parent(target->lead_thread) != current) { + && ptrace_parent(target->lead_thread) != current) { pr_err("PID %i is not PTRACED and cannot be debugged\n", args->pid); r = -EPERM; } @@ -2985,6 +2968,11 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, v goto out; mutex_lock(>mutex); + if (!!target->pc_sampling_ref) { + pr_debug("Cannot enable debug trap on PID:%d because PC Sampling active\n", args->pid); + r = -EBUSY; + goto unlock_out; + } if (args->op != KFD_IOC_DBG_TRAP_ENABLE && !target->debug_trap_enabled) { pr_err("PID %i not debug enabled for op %i\n", args->pid, args->op); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index d889e3545120..8d836c65c636 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -1120,3 +1120,29 @@ void kfd_dbg_set_enabled_debug_exception_mask(struct kfd_process *target, mutex_unlock(>event_mutex); } + +void kfd_dbg_enable_ttmp_setup(struct kfd_process *p) +{ + int i; + + if (p->runtime_info.ttmp_setup) + return; + + p->runtime_info.ttmp_setup = true; + for (i = 0; i < p->n_pdds; i++) { + struct kfd_process_device *pdd = p->pdds[i]; + + if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) { + amdgpu_gfx_off_ctrl(pdd->dev->adev, false); + pdd->dev->kfd2kgd->enable_debug_trap( + pdd->dev->adev, + true, + pdd->dev->vm_info.last_vmid_kfd); + } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) { +
[PATCH v4 24/24] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability
Bump the minor version to declare pc sampling feature is now available. Signed-off-by: James Zhu --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index ec1b6404b185..7c2c867b57e8 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -41,9 +41,10 @@ * - 1.13 - Add debugger API * - 1.14 - Update kfd_event_data * - 1.15 - Enable managing mappings in compute VMs with GEM_VA ioctl + * - 1.16 - Add PC Sampling ioctl */ #define KFD_IOCTL_MAJOR_VERSION 1 -#define KFD_IOCTL_MINOR_VERSION 15 +#define KFD_IOCTL_MINOR_VERSION 16 struct kfd_ioctl_get_version_args { __u32 major_version;/* from KFD */ -- 2.25.1
[PATCH v4 15/24] drm/amdkfd: trigger pc sampling trap for aldebaran
Implement trigger pc sampling trap for aldebaran. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index aff08321e976..27eda75ceecb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,6 +163,16 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } +static uint32_t kgd_aldebaran_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 8, 4, + target_simd, target_wave_slot, method); +} + const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -191,4 +201,5 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings, + .trigger_pc_sample_trap = kgd_aldebaran_trigger_pc_sample_trap, }; -- 2.25.1
[PATCH v4 18/24] drm/amdkfd: enable pc sampling stop
Enable pc sampling stop. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 29 ++-- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 4 +++ 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index b46caa52fbe8..53e44e68408e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -99,10 +99,33 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd) return -EINVAL; } -static int kfd_pc_sample_stop(struct kfd_process_device *pdd) +static int kfd_pc_sample_stop(struct kfd_process_device *pdd, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + bool pc_sampling_stop = false; + + pcs_entry->enabled = false; + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.active_count--; + if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) { + WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, true); + pc_sampling_stop = true; + } + mutex_unlock(>dev->pcs_data.mutex); + + kfd_process_set_trap_pc_sampling_flag(>qpd, + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false); + if (pc_sampling_stop) { + + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0; + pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0; + WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, false); + mutex_unlock(>dev->pcs_data.mutex); + } + + return 0; } static int kfd_pc_sample_create(struct kfd_process_device *pdd, @@ -250,7 +273,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (!pcs_entry->enabled) return -EALREADY; else - return kfd_pc_sample_stop(pdd); + return kfd_pc_sample_stop(pdd, pcs_entry); } return -EINVAL; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 5a7805147da0..7bdcbe6be4fe 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -271,6 +271,10 @@ struct kfd_dev; struct kfd_dev_pc_sampling_data { uint32_t use_count; /* Num of PC sampling sessions */ + uint32_t active_count; /* Num of active sessions */ + uint32_t target_simd; /* target simd for trap */ + uint32_t target_wave_slot; /* target wave slot for trap */ + bool stop_enable; /* pc sampling stop in process */ struct idr pc_sampling_idr; struct kfd_pc_sample_info pc_sample_info; }; -- 2.25.1
[PATCH v4 19/24] drm/amdkfd: add queue remapping
Add queue remapping to ensure that any waves executing the PC sampling part of the trap handler are done before kfd_pc_sample_stop returns, and that no new waves enter that part of the trap handler afterwards. This avoids race conditions that could lead to use-after-free. Unmapping and remapping the queues either waits for the waves to drain, or preempts them with CWSR, which itself executes a trap and waits for previous traps to finish. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++ drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 5 + drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 4 +++- 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index c0e71543389a..a3f57be63f4f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager *dqm) return debug_map_and_unlock(dqm); } +void remap_queue(struct device_queue_manager *dqm, + enum kfd_unmap_queues_filter filter, + uint32_t filter_param, + uint32_t grace_period) +{ + dqm_lock(dqm); + if (!dqm->dev->kfd->shared_resources.enable_mes) + execute_queues_cpsch(dqm, filter, filter_param, grace_period); + dqm_unlock(dqm); +} + #if defined(CONFIG_DEBUG_FS) static void seq_reg_dump(struct seq_file *m, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index cf7e182588f8..f8aae3747a36 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm); int debug_map_and_unlock(struct device_queue_manager *dqm); int debug_refresh_runlist(struct device_queue_manager *dqm); +void remap_queue(struct device_queue_manager *dqm, + enum kfd_unmap_queues_filter filter, + uint32_t filter_param, + uint32_t grace_period); + static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd) { return (pdd->lds_base >> 16) & 0xFF; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 53e44e68408e..df2f4bfd0cda 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -24,6 +24,7 @@ #include "kfd_priv.h" #include "amdgpu_amdkfd.h" #include "kfd_pc_sampling.h" +#include "kfd_device_queue_manager.h" struct supported_pc_sample_info { uint32_t ip_version; @@ -115,9 +116,10 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd, kfd_process_set_trap_pc_sampling_flag(>qpd, pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false); + remap_queue(pdd->dev->dqm, + KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD); if (pc_sampling_stop) { - mutex_lock(>dev->pcs_data.mutex); pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0; pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0; -- 2.25.1
[PATCH v4 09/24] drm/amdkfd: add interface to trigger pc sampling trap
Add interface to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h index 6d094cf3587d..12f9021d563e 100644 --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h @@ -31,6 +31,8 @@ #include #include #include +#include + #include "amdgpu_irq.h" #include "amdgpu_gfx.h" @@ -318,6 +320,11 @@ struct kfd2kgd_calls { void (*program_trap_handler_settings)(struct amdgpu_device *adev, uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr, uint32_t inst); + uint32_t (*trigger_pc_sample_trap)(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method); }; #endif /* KGD_KFD_INTERFACE_H_INCLUDED */ -- 2.25.1
[PATCH v4 17/24] drm/amdkfd: add setting trap pc sampling flag
Add setting trap pc sampling flag. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 + 2 files changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 2df240518d1f..5a7805147da0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1198,6 +1198,8 @@ void kfd_process_set_trap_handler(struct qcm_process_device *qpd, uint64_t tma_addr); void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, bool enabled); +void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd, +enum kfd_ioctl_pc_sample_method method, bool enabled); /* CWSR initialization */ int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file *filep); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 3e3cead6ccf8..4a450abf9fa9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1463,6 +1463,19 @@ void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, } } +void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd, +enum kfd_ioctl_pc_sample_method method, bool enabled) +{ + if (qpd->cwsr_kaddr) { + volatile unsigned long *tma = + (volatile unsigned long *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); + if (enabled) + set_bit(method, [2]); + else + clear_bit(method, [2]); + } +} + /* * On return the kfd_process is fully operational and will be freed when the * mm is released -- 2.25.1
[PATCH v4 05/24] drm/amdkfd: enable pc sampling create
From: David Yat Sin Enable pc sampling create. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 59 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index e9277c9beec7..9267de0bbdac 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -108,7 +108,64 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd) static int kfd_pc_sample_create(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { - return -EINVAL; + struct kfd_pc_sample_info *supported_format = NULL; + struct kfd_pc_sample_info user_info; + int ret; + int i; + + if (user_args->num_sample_info != 1) + return -EINVAL; + + ret = copy_from_user(_info, (void __user *) user_args->sample_info_ptr, + sizeof(struct kfd_pc_sample_info)); + if (ret) { + pr_debug("Failed to copy PC sampling info from user\n"); + return -EFAULT; + } + + if (user_info.flags & KFD_IOCTL_PCS_FLAG_POWER_OF_2 && + user_info.interval & (user_info.interval - 1)) { + pr_debug("Sampling interval's power is unmatched!"); + return -EINVAL; + } + + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) { + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version + && user_info.method == supported_formats[i].sample_info->method + && user_info.type == supported_formats[i].sample_info->type + && user_info.interval <= supported_formats[i].sample_info->interval_max + && user_info.interval >= supported_formats[i].sample_info->interval_min) { + supported_format = + (struct kfd_pc_sample_info *)supported_formats[i].sample_info; + break; + } + } + + if (!supported_format) { + pr_debug("Sampling format is not supported!"); + return -EOPNOTSUPP; + } + + mutex_lock(>dev->pcs_data.mutex); + if (pdd->dev->pcs_data.hosttrap_entry.base.use_count && + memcmp(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, + _info, sizeof(user_info))) { + ret = copy_to_user((void __user *) user_args->sample_info_ptr, + >dev->pcs_data.hosttrap_entry.base.pc_sample_info, + sizeof(struct kfd_pc_sample_info)); + mutex_unlock(>dev->pcs_data.mutex); + return ret ? -EFAULT : -EEXIST; + } + + /* TODO: add trace_id return */ + + if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = user_info; + + pdd->dev->pcs_data.hosttrap_entry.base.use_count++; + mutex_unlock(>dev->pcs_data.mutex); + + return 0; } static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index f55195fea3df..96999f602224 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -269,9 +269,19 @@ struct kfd_vmid_info { struct kfd_dev; +struct kfd_dev_pc_sampling_data { + uint32_t use_count; /* Num of PC sampling sessions */ + struct kfd_pc_sample_info pc_sample_info; +}; + +struct kfd_dev_pcs_hosttrap { + struct kfd_dev_pc_sampling_data base; +}; + /* Per device PC Sampling data */ struct kfd_dev_pc_sampling { struct mutex mutex; + struct kfd_dev_pcs_hosttrap hosttrap_entry; }; struct kfd_node { -- 2.25.1
[PATCH v4 11/24] drm/amdkfd/gfx9: enable host trap
Enable host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++ .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 --- 2 files changed, 52 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index d1caaf0e6a7c..af1f678790e7 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -274,14 +274,14 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { static const uint32_t cwsr_trap_gfx9_hex[] = { - 0xbf820001, 0xbf820258, + 0xbf820001, 0xbf82025e, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -294,7 +294,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -303,13 +303,16 @@ static const uint32_t cwsr_trap_gfx9_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f, + 0xc0031c3d, 0x0010, + 0xc0071bbd, 0x, 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x86ee6e6e, + 0xbf8cc07f, 0x8671ff6d, + 0x0100, 0xbf840004, + 0x92f1ff70, 0x00010001, + 0xbf840016, 0xbf820005, + 0x86708170, 0x8e709770, + 0x8977ff77, 0x0080, + 0x8077, 0x86ee6e6e, 0xbf840001, 0xbe801d6e, 0x866eff6d, 0x01ff, 0xbf850005, 0x8778ff78, @@ -1098,14 +1101,14 @@ static const uint32_t cwsr_trap_nv1x_hex[] = { }; static const uint32_t cwsr_trap_arcturus_hex[] = { - 0xbf820001, 0xbf8202d4, + 0xbf820001, 0xbf8202da, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -1118,7 +1121,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -1127,13 +1130,16 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f, + 0xc0031c3d, 0x0010, + 0xc0071bbd, 0x, 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x86ee6e6e, + 0xbf8cc07f, 0x8671ff6d, + 0x0100, 0xbf840004, + 0x92f1ff70, 0x00010001, + 0xbf840016, 0xbf820005, + 0x86708170, 0x8e709770, + 0x8977ff77, 0x0080, + 0x8077, 0x86ee6e6e, 0xbf840001, 0xbe801d6e, 0x866eff6d, 0x01ff, 0xbf850005, 0x8778ff78, @@ -1578,14 +1584,14 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { }; static const uint32_t cwsr_trap_aldebaran_hex[] = { - 0xbf820001, 0xbf8202df, + 0xbf820001, 0xbf8202e5, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -1598,7 +1604,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -1607,13 +1613,16 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f
[PATCH v4 06/24] drm/amdkfd: add trace_id return
Add trace_id return for new pc sampling creation per device, Use IDR to quickly locate pc_sampling_entry for reference. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 6 ++ 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0e24e011f66b..bcaeedac8fe0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev) static void kfd_pc_sampling_init(struct kfd_node *dev) { mutex_init(>pcs_data.mutex); + idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1); } static void kfd_pc_sampling_exit(struct kfd_node *dev) { + idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr); mutex_destroy(>pcs_data.mutex); } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 9267de0bbdac..a607fc148958 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -110,6 +110,7 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, { struct kfd_pc_sample_info *supported_format = NULL; struct kfd_pc_sample_info user_info; + struct pc_sampling_entry *pcs_entry; int ret; int i; @@ -157,7 +158,19 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, return ret ? -EFAULT : -EEXIST; } - /* TODO: add trace_id return */ + pcs_entry = kzalloc(sizeof(*pcs_entry), GFP_KERNEL); + if (!pcs_entry) { + mutex_unlock(>dev->pcs_data.mutex); + return -ENOMEM; + } + + i = idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + pcs_entry, 1, 0, GFP_KERNEL); + if (i < 0) { + mutex_unlock(>dev->pcs_data.mutex); + kfree(pcs_entry); + return i; + } if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = user_info; @@ -165,6 +178,11 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, pdd->dev->pcs_data.hosttrap_entry.base.use_count++; mutex_unlock(>dev->pcs_data.mutex); + pcs_entry->pdd = pdd; + user_args->trace_id = (uint32_t)i; + + pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", pcs_entry, i, pdd->dev->id); + return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 96999f602224..2df240518d1f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -271,6 +271,7 @@ struct kfd_dev; struct kfd_dev_pc_sampling_data { uint32_t use_count; /* Num of PC sampling sessions */ + struct idr pc_sampling_idr; struct kfd_pc_sample_info pc_sample_info; }; @@ -756,6 +757,11 @@ enum kfd_pdd_bound { */ #define SDMA_ACTIVITY_DIVISOR 100 +struct pc_sampling_entry { + bool enabled; + struct kfd_process_device *pdd; +}; + /* Data that is per-process-per device. */ struct kfd_process_device { /* The device that owns this data. */ -- 2.25.1
[PATCH v4 16/24] drm/amdkfd: use bit operation set debug trap
1st level TMA's 2nd byte which used for trap type setting, to use bit operation to change selected bit only. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 717a60d7a4ea..3e3cead6ccf8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1443,13 +1443,23 @@ bool kfd_process_xnack_mode(struct kfd_process *p, bool supported) return true; } +/* bit offset in 1st-level TMA's 2nd byte which used for KFD_TRAP_TYPE_BIT */ +enum KFD_TRAP_TYPE_BIT { + KFD_TRAP_TYPE_DEBUG = 0,/* bit 0 for debug trap */ + KFD_TRAP_TYPE_HOST, + KFD_TRAP_TYPE_STOCHASTIC, +}; + void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, bool enabled) { if (qpd->cwsr_kaddr) { - uint64_t *tma = - (uint64_t *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); - tma[2] = enabled; + volatile unsigned long *tma = + (volatile unsigned long *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); + if (enabled) + set_bit(KFD_TRAP_TYPE_DEBUG, [2]); + else + clear_bit(KFD_TRAP_TYPE_DEBUG, [2]); } } -- 2.25.1
[PATCH v4 20/24] drm/amdkfd: enable pc sampling start
Enable pc sampling start. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 27 +--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index df2f4bfd0cda..6f50ba1f8989 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -95,9 +95,30 @@ static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, return 0; } -static int kfd_pc_sample_start(struct kfd_process_device *pdd) +static int kfd_pc_sample_start(struct kfd_process_device *pdd, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + bool pc_sampling_start = false; + + pcs_entry->enabled = true; + mutex_lock(>dev->pcs_data.mutex); + + kfd_process_set_trap_pc_sampling_flag(>qpd, + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true); + + if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) + pc_sampling_start = true; + pdd->dev->pcs_data.hosttrap_entry.base.active_count++; + mutex_unlock(>dev->pcs_data.mutex); + + while (pc_sampling_start) { + if (READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) + usleep_range(1000, 2000); + else + break; + } + + return 0; } static int kfd_pc_sample_stop(struct kfd_process_device *pdd, @@ -269,7 +290,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (pcs_entry->enabled) return -EALREADY; else - return kfd_pc_sample_start(pdd); + return kfd_pc_sample_start(pdd, pcs_entry); case KFD_IOCTL_PCS_OP_STOP: if (!pcs_entry->enabled) -- 2.25.1
[PATCH v4 14/24] drm/amdkfd: trigger pc sampling trap for arcturus
Implement trigger pc sampling trap for arcturus. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c| 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c index 0ba15dcbe4e1..10b362e072a6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c @@ -390,6 +390,17 @@ static uint32_t kgd_arcturus_disable_debug_trap(struct amdgpu_device *adev, return 0; } + +static uint32_t kgd_arcturus_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 10, 4, + target_simd, target_wave_slot, method); +} + const struct kfd2kgd_calls arcturus_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -418,5 +429,6 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = { .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy, - .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings + .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings, + .trigger_pc_sample_trap = kgd_arcturus_trigger_pc_sample_trap }; -- 2.25.1
[PATCH v4 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support
From: David Yat Sin Add pc sampling support in kfd_ioctl. The user mode code which uses this new kfd_ioctl is linked to https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface with master branch. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- include/uapi/linux/kfd_ioctl.h | 61 +- 1 file changed, 60 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 9ce46edc62a5..ec1b6404b185 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1447,6 +1447,62 @@ struct kfd_ioctl_dbg_trap_args { }; }; +/** + * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations + * + * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities + * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a per-device PC sampler instance + * @KFD_IOCTL_PCS_OP_DESTROY:Unregister from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_START: Process begins taking samples from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_STOP: Process stops taking samples from a previously registered PC sampler instance + */ +enum kfd_ioctl_pc_sample_op { + KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES, + KFD_IOCTL_PCS_OP_CREATE, + KFD_IOCTL_PCS_OP_DESTROY, + KFD_IOCTL_PCS_OP_START, + KFD_IOCTL_PCS_OP_STOP, +}; + +/* Values have to be a power of 2*/ +#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001 + +enum kfd_ioctl_pc_sample_method { + KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1, + KFD_IOCTL_PCS_METHOD_STOCHASTIC, +}; + +enum kfd_ioctl_pc_sample_type { + KFD_IOCTL_PCS_TYPE_TIME_US, + KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES, + KFD_IOCTL_PCS_TYPE_INSTRUCTIONS +}; + +struct kfd_pc_sample_info { + __u64 interval; /* [IN] if PCS_TYPE_INTERVAL_US: sample interval in us + * if PCS_TYPE_CLOCK_CYCLES: sample interval in graphics core clk cycles + * if PCS_TYPE_INSTRUCTIONS: sample interval in instructions issued by + * graphics compute units + */ + __u64 interval_min; /* [OUT] */ + __u64 interval_max; /* [OUT] */ + __u64 flags; /* [OUT] indicate potential restrictions e.g FLAG_POWER_OF_2 */ + __u32 method;/* [IN/OUT] kfd_ioctl_pc_sample_method */ + __u32 type; /* [IN/OUT] kfd_ioctl_pc_sample_type */ +}; + +#define KFD_IOCTL_PCS_QUERY_TYPE_FULL (1 << 0) /* If not set, return current */ + +struct kfd_ioctl_pc_sample_args { + __u64 sample_info_ptr; /* array of kfd_pc_sample_info */ + __u32 num_sample_info; + __u32 op;/* kfd_ioctl_pc_sample_op */ + __u32 gpu_id; + __u32 trace_id; + __u32 flags; /* kfd_ioctl_pcs_query flags */ + __u32 reserved; +}; + #define AMDKFD_IOCTL_BASE 'K' #define AMDKFD_IO(nr) _IO(AMDKFD_IOCTL_BASE, nr) #define AMDKFD_IOR(nr, type) _IOR(AMDKFD_IOCTL_BASE, nr, type) @@ -1567,7 +1623,10 @@ struct kfd_ioctl_dbg_trap_args { #define AMDKFD_IOC_DBG_TRAP\ AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args) +#define AMDKFD_IOC_PC_SAMPLE \ + AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args) + #define AMDKFD_COMMAND_START 0x01 -#define AMDKFD_COMMAND_END 0x27 +#define AMDKFD_COMMAND_END 0x28 #endif -- 2.25.1
[PATCH v4 08/24] drm/amdkfd: enable pc sampling destroy
Enable pc sampling destroy. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 72c66d4bd24f..b46caa52fbe8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -186,10 +186,24 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, return 0; } -static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id) +static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + pr_debug("free pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", + pcs_entry, trace_id, pdd->dev->id); + + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.use_count--; + idr_remove(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, trace_id); + if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) + memset(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, 0x0, + sizeof(struct kfd_pc_sample_info)); + mutex_unlock(>dev->pcs_data.mutex); + + kfree(pcs_entry); + + return 0; } int kfd_pc_sample(struct kfd_process_device *pdd, @@ -224,7 +238,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (pcs_entry->enabled) return -EBUSY; else - return kfd_pc_sample_destroy(pdd, args->trace_id); + return kfd_pc_sample_destroy(pdd, args->trace_id, pcs_entry); case KFD_IOCTL_PCS_OP_START: if (pcs_entry->enabled) -- 2.25.1
[PATCH v4 10/24] drm/amdkfd: trigger pc sampling trap for gfx v9
Implement trigger pc sampling trap for gfx v9. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 36 +++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 7 2 files changed, 43 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index 5a35a8ca8922..7d8c0e13ac12 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1144,6 +1144,42 @@ void kgd_gfx_v9_program_trap_handler_settings(struct amdgpu_device *adev, kgd_gfx_v9_unlock_srbm(adev, inst); } +uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t max_wave_slot, + uint32_t max_simd, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) { + uint32_t value = 0; + + value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP); + value = REG_SET_FIELD(value, SQ_CMD, MODE, SQ_IND_CMD_MODE_SINGLE); + + /* select *target_simd */ + value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd); + /* select *target_wave_slot */ + value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, (*target_wave_slot)++); + + mutex_lock(>grbm_idx_mutex); + amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); + WREG32_SOC15(GC, 0, mmSQ_CMD, value); + mutex_unlock(>grbm_idx_mutex); + + *target_wave_slot %= max_wave_slot; + if (!(*target_wave_slot)) { + (*target_simd)++; + *target_simd %= max_simd; + } + } else { + pr_debug("PC Sampling method %d not supported.", method); + return -EOPNOTSUPP; + } + return 0; +} + const struct kfd2kgd_calls gfx_v9_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h index ce424615f59b..b47b926891a8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h @@ -101,3 +101,10 @@ void kgd_gfx_v9_build_grace_period_packet_info(struct amdgpu_device *adev, uint32_t grace_period, uint32_t *reg_offset, uint32_t *reg_data); +uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t max_wave_slot, + uint32_t max_simd, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method); -- 2.25.1
[PATCH v4 13/24] drm/amdgpu: add sq host trap status check
Before fire a new host trap, check the host trap status. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 35 +++ .../amd/include/asic_reg/gc/gc_9_0_offset.h | 2 ++ .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h | 5 +++ 3 files changed, 42 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index adfe5e5585e5..43edd62df5fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1144,6 +1144,35 @@ void kgd_gfx_v9_program_trap_handler_settings(struct amdgpu_device *adev, kgd_gfx_v9_unlock_srbm(adev, inst); } +static uint32_t kgd_aldebaran_get_hosttrap_status(struct amdgpu_device *adev) +{ + uint32_t sq_hosttrap_status = 0x0; + int i, j; + + mutex_lock(>grbm_idx_mutex); + for (i = 0; i < adev->gfx.config.max_shader_engines; i++) { + for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) { + amdgpu_gfx_select_se_sh(adev, i, j, 0x, 0); + sq_hosttrap_status = RREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS); + + if (sq_hosttrap_status & SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK) { + WREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS, + SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK); + sq_hosttrap_status = 0x0; + continue; + } + if (sq_hosttrap_status) + goto out; + } + } + +out: + amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); + mutex_unlock(>grbm_idx_mutex); + + return sq_hosttrap_status; +} + uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, uint32_t vmid, uint32_t max_wave_slot, @@ -1154,6 +1183,12 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, { if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) { uint32_t value = 0; + uint32_t sq_hosttrap_status = 0x0; + + sq_hosttrap_status = kgd_aldebaran_get_hosttrap_status(adev); + /* skip when last host trap request is still pending to complete */ + if (sq_hosttrap_status) + return 0; value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP); value = REG_SET_FIELD(value, SQ_CMD, MODE, SQ_IND_CMD_MODE_SINGLE); diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h index 12d451e5475b..5b17d9066452 100644 --- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h +++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h @@ -462,6 +462,8 @@ #define mmSQ_IND_DATA_BASE_IDX 0 #define mmSQ_CMD 0x037b #define mmSQ_CMD_BASE_IDX 0 +#define mmSQ_HOSTTRAP_STATUS 0x0376 +#define mmSQ_HOSTTRAP_STATUS_BASE_IDX 0 #define mmSQ_TIME_HI 0x037c #define mmSQ_TIME_HI_BASE_IDX 0 #define mmSQ_TIME_LO 0x037d diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h index efc16ddf274a..3dfe4ab31421 100644 --- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h +++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h @@ -2616,6 +2616,11 @@ //SQ_CMD_TIMESTAMP #define SQ_CMD_TIMESTAMP__TIMESTAMP__SHIFT 0x0 #define SQ_CMD_TIMESTAMP__TIMESTAMP_MASK 0x00FFL +//SQ_HOSTTRAP_STATUS +#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT__SHIFT 0x0 +#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE__SHIFT 0x8 +#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT_MASK 0x00FFL +#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK
[PATCH v4 07/24] drm/amdkfd: check pcs_entry valid
Check pcs_entry valid for pc sampling ioctl. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++-- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index a607fc148958..72c66d4bd24f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -195,6 +195,24 @@ static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_ int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args) { + struct pc_sampling_entry *pcs_entry; + + if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES && + args->op != KFD_IOCTL_PCS_OP_CREATE) { + + mutex_lock(>dev->pcs_data.mutex); + pcs_entry = idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + args->trace_id); + mutex_unlock(>dev->pcs_data.mutex); + + /* pcs_entry is only for this pc sampling process, +* which has kfd_process->mutex protected here. +*/ + if (!pcs_entry || + pcs_entry->pdd != pdd) + return -EINVAL; + } + switch (args->op) { case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: return kfd_pc_sample_query_cap(pdd, args); @@ -203,13 +221,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd, return kfd_pc_sample_create(pdd, args); case KFD_IOCTL_PCS_OP_DESTROY: - return kfd_pc_sample_destroy(pdd, args->trace_id); + if (pcs_entry->enabled) + return -EBUSY; + else + return kfd_pc_sample_destroy(pdd, args->trace_id); case KFD_IOCTL_PCS_OP_START: - return kfd_pc_sample_start(pdd); + if (pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_start(pdd); case KFD_IOCTL_PCS_OP_STOP: - return kfd_pc_sample_stop(pdd); + if (!pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_stop(pdd); } return -EINVAL; -- 2.25.1
[PATCH v4 04/24] drm/amdkfd: add pc sampling mutex
Add pc sampling mutex per node, and do init/destroy in node init. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +++ 2 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0a9cf9dfc224..0e24e011f66b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -533,6 +533,16 @@ static void kfd_smi_init(struct kfd_node *dev) spin_lock_init(>smi_lock); } +static void kfd_pc_sampling_init(struct kfd_node *dev) +{ + mutex_init(>pcs_data.mutex); +} + +static void kfd_pc_sampling_exit(struct kfd_node *dev) +{ + mutex_destroy(>pcs_data.mutex); +} + static int kfd_init_node(struct kfd_node *node) { int err = -1; @@ -563,6 +573,7 @@ static int kfd_init_node(struct kfd_node *node) } kfd_smi_init(node); + kfd_pc_sampling_init(node); return 0; @@ -593,6 +604,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes) kfd_topology_remove_device(knode); if (knode->gws) amdgpu_amdkfd_free_gws(knode->adev, knode->gws); + kfd_pc_sampling_exit(knode); kfree(knode); kfd->nodes[i] = NULL; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index ae9a41670909..f55195fea3df 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -269,6 +269,11 @@ struct kfd_vmid_info { struct kfd_dev; +/* Per device PC Sampling data */ +struct kfd_dev_pc_sampling { + struct mutex mutex; +}; + struct kfd_node { unsigned int node_id; struct amdgpu_device *adev; /* Duplicated here along with keeping @@ -322,6 +327,8 @@ struct kfd_node { struct kfd_local_mem_info local_mem_info; struct kfd_dev *kfd; + + struct kfd_dev_pc_sampling pcs_data; }; struct kfd_dev { -- 2.25.1
[PATCH v4 03/24] drm/amdkfd: enable pc sampling query
From: David Yat Sin Enable pc sampling to query system capability. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 65 +++- 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index a7e78ff42d07..e9277c9beec7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -25,10 +25,73 @@ #include "amdgpu_amdkfd.h" #include "kfd_pc_sampling.h" +struct supported_pc_sample_info { + uint32_t ip_version; + const struct kfd_pc_sample_info *sample_info; +}; + +const struct kfd_pc_sample_info sample_info_hosttrap_9_0_0 = { + 0, 1, ~0ULL, 0, KFD_IOCTL_PCS_METHOD_HOSTTRAP, KFD_IOCTL_PCS_TYPE_TIME_US }; + +struct supported_pc_sample_info supported_formats[] = { + { IP_VERSION(9, 4, 1), _info_hosttrap_9_0_0 }, + { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 }, +}; + static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { - return -EINVAL; + uint64_t sample_offset; + int num_method = 0; + int ret; + int i; + + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version) + num_method++; + + if (!num_method) { + pr_debug("PC Sampling not supported on GC_HWIP:0x%x.", + pdd->dev->adev->ip_versions[GC_HWIP][0]); + return -EOPNOTSUPP; + } + + ret = 0; + mutex_lock(>dev->pcs_data.mutex); + if (user_args->flags != KFD_IOCTL_PCS_QUERY_TYPE_FULL && + pdd->dev->pcs_data.hosttrap_entry.base.use_count) { + /* If we already have a session, restrict returned list to current method */ + user_args->num_sample_info = 1; + + if (user_args->sample_info_ptr) + ret = copy_to_user((void __user *) user_args->sample_info_ptr, + >dev->pcs_data.hosttrap_entry.base.pc_sample_info, + sizeof(struct kfd_pc_sample_info)); + mutex_unlock(>dev->pcs_data.mutex); + return ret ? -EFAULT : 0; + } + mutex_unlock(>dev->pcs_data.mutex); + + if (!user_args->sample_info_ptr || user_args->num_sample_info < num_method) { + user_args->num_sample_info = num_method; + pr_debug("ASIC requires space for %d kfd_pc_sample_info entries.", num_method); + return -ENOSPC; + } + + sample_offset = user_args->sample_info_ptr; + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) { + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version) { + ret = copy_to_user((void __user *) sample_offset, + supported_formats[i].sample_info, sizeof(struct kfd_pc_sample_info)); + if (ret) { + pr_debug("Failed to copy PC sampling info to user."); + return -EFAULT; + } + sample_offset += sizeof(struct kfd_pc_sample_info); + } + } + + return 0; } static int kfd_pc_sample_start(struct kfd_process_device *pdd) -- 2.25.1
[PATCH v4 02/24] drm/amdkfd: add pc sampling support
From: David Yat Sin Add pc sampling functions in amdkfd. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/Makefile | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 45 +++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 78 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 34 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 13 5 files changed, 172 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile index a5ae7bcf44eb..790fd028a681 100644 --- a/drivers/gpu/drm/amd/amdkfd/Makefile +++ b/drivers/gpu/drm/amd/amdkfd/Makefile @@ -57,7 +57,8 @@ AMDKFD_FILES := $(AMDKFD_PATH)/kfd_module.o \ $(AMDKFD_PATH)/kfd_int_process_v11.o \ $(AMDKFD_PATH)/kfd_smi_events.o \ $(AMDKFD_PATH)/kfd_crat.o \ - $(AMDKFD_PATH)/kfd_debug.o + $(AMDKFD_PATH)/kfd_debug.o \ + $(AMDKFD_PATH)/kfd_pc_sampling.o ifneq ($(CONFIG_DEBUG_FS),) AMDKFD_FILES += $(AMDKFD_PATH)/kfd_debugfs.o diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 80e90fdef291..d9cac97c54c0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -41,6 +41,7 @@ #include "kfd_priv.h" #include "kfd_device_queue_manager.h" #include "kfd_svm.h" +#include "kfd_pc_sampling.h" #include "amdgpu_amdkfd.h" #include "kfd_smi_events.h" #include "amdgpu_dma_buf.h" @@ -1745,6 +1746,39 @@ static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data) } #endif +static int kfd_ioctl_pc_sample(struct file *filep, + struct kfd_process *p, void __user *data) +{ + struct kfd_ioctl_pc_sample_args *args = data; + struct kfd_process_device *pdd; + int ret = 0; + + if (sched_policy == KFD_SCHED_POLICY_NO_HWS) { + pr_err("PC Sampling does not support sched_policy %i", sched_policy); + return -EINVAL; + } + + mutex_lock(>mutex); + pdd = kfd_process_device_data_by_id(p, args->gpu_id); + + if (!pdd) { + pr_debug("could not find gpu id 0x%x.", args->gpu_id); + ret = -EINVAL; + } else if (args->op == KFD_IOCTL_PCS_OP_START) { + pdd = kfd_bind_process_to_device(pdd->dev, p); + if (IS_ERR(pdd)) { + pr_debug("failed to bind process %p with gpu id 0x%x", p, args->gpu_id); + ret = -ESRCH; + } + } + + if (!ret) + ret = kfd_pc_sample(pdd, args); + mutex_unlock(>mutex); + + return ret; +} + static int criu_checkpoint_process(struct kfd_process *p, uint8_t __user *user_priv_data, uint64_t *priv_offset) @@ -3219,6 +3253,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = { AMDKFD_IOCTL_DEF(AMDKFD_IOC_DBG_TRAP, kfd_ioctl_set_debug_trap, 0), + + AMDKFD_IOCTL_DEF(AMDKFD_IOC_PC_SAMPLE, + kfd_ioctl_pc_sample, KFD_IOC_FLAG_PERFMON), }; #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls) @@ -3295,6 +3332,14 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) } } + /* PC Sampling Monitor */ + if (unlikely(ioctl->flags & KFD_IOC_FLAG_PERFMON)) { + if (!capable(CAP_PERFMON) && !capable(CAP_SYS_ADMIN)) { + retcode = -EACCES; + goto err_i1; + } + } + if (cmd & (IOC_IN | IOC_OUT)) { if (asize <= sizeof(stack_kdata)) { kdata = stack_kdata; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c new file mode 100644 index ..a7e78ff42d07 --- /dev/null +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -0,0 +1,78 @@ +// SPDX-License-Identifier: GPL-2.0 OR MIT +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above
[PATCH v4 00/24] Support Host Trap Sampling for gfx941/gfx942
PC sampling is a form of software profiling, where the threads of an application are periodically interrupted and the program counter that the threads are currently attempting to execute is saved out for profiling. David Yat Sin (5): drm/amdkfd/kfd_ioctl: add pc sampling support drm/amdkfd: add pc sampling support drm/amdkfd: enable pc sampling query drm/amdkfd: enable pc sampling create drm/amdkfd: Set debug trap bit when enabling PC Sampling James Zhu (19): drm/amdkfd: add pc sampling mutex drm/amdkfd: add trace_id return drm/amdkfd: check pcs_entry valid drm/amdkfd: enable pc sampling destroy drm/amdkfd: add interface to trigger pc sampling trap drm/amdkfd: trigger pc sampling trap for gfx v9 drm/amdkfd/gfx9: enable host trap drm/amdgpu: use trapID 4 for host trap drm/amdgpu: add sq host trap status check drm/amdkfd: trigger pc sampling trap for arcturus drm/amdkfd: trigger pc sampling trap for aldebaran drm/amdkfd: use bit operation set debug trap drm/amdkfd: add setting trap pc sampling flag drm/amdkfd: enable pc sampling stop drm/amdkfd: add queue remapping drm/amdkfd: enable pc sampling start drm/amdkfd: add pc sampling thread to trigger trap drm/amdkfd: add pc sampling release when process release drm/amdkfd: bump kfd ioctl minor version for pc sampling availability .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 14 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 73 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 + drivers/gpu/drm/amd/amdkfd/Makefile |3 +- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 + .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 29 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 75 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 26 + drivers/gpu/drm/amd/amdkfd/kfd_debug.h|3 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 14 + .../drm/amd/amdkfd/kfd_device_queue_manager.c | 11 + .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 + drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 426 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 35 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 46 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 32 +- .../amd/include/asic_reg/gc/gc_9_0_offset.h |2 + .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h |5 + .../gpu/drm/amd/include/kgd_kfd_interface.h |7 + include/uapi/linux/kfd_ioctl.h| 64 +- 21 files changed, 1914 insertions(+), 1080 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h -- 2.25.1
Re: [PATCH] drm/amdgpu: make a correction on comment
On 2024-01-08 03:12, Christian König wrote: Am 02.01.24 um 21:56 schrieb James Zhu: Current AMDGPU_VM_RESERVED_VRAM is updated to 8M. Signed-off-by: James Zhu Maybe remove the value completely from the comment, just something like "How much memory be reserved for page tables". [JZ] This will work better. Thanks! Either way Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index b6cd565562ad..b788067b9158 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -116,7 +116,7 @@ struct amdgpu_mem_stats; #define AMDGPU_VM_FAULT_STOP_FIRST 1 #define AMDGPU_VM_FAULT_STOP_ALWAYS 2 -/* Reserve 4MB VRAM for page tables */ +/* Reserve 8MB VRAM for page tables */ #define AMDGPU_VM_RESERVED_VRAM (8ULL << 20) /*
[PATCH] drm/amdgpu: make a correction on comment
Current AMDGPU_VM_RESERVED_VRAM is updated to 8M. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index b6cd565562ad..b788067b9158 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -116,7 +116,7 @@ struct amdgpu_mem_stats; #define AMDGPU_VM_FAULT_STOP_FIRST 1 #define AMDGPU_VM_FAULT_STOP_ALWAYS2 -/* Reserve 4MB VRAM for page tables */ +/* Reserve 8MB VRAM for page tables */ #define AMDGPU_VM_RESERVED_VRAM(8ULL << 20) /* -- 2.25.1
Re: [PATCH v3 23/24] drm/amdkfd: set debug trap bit when enabling PC Sampling
On 2023-12-15 10:59, James Zhu wrote: From: David Yat Sin We need the SPI_GDBG_PER_VMID_CNTL.TRAP_EN bit to be set during PC Sampling so that the TTMP registers are valid inside the sampling data. runtime_info.ttmp_setup will be cleared when the user application does the AMDKFD_IOC_RUNTIME_ENABLE ioctl without KFD_RUNTIME_ENABLE_MODE_ENABLE_MASK flag on exit. It is also not valid to have the debugger attached to a process while PC sampling is enabled so adding some checks to prevent this. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 31 -- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 22 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 43 +--- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 4 +- 5 files changed, 75 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 1a3a8ded9c93..f7a8794c2bde 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1775,7 +1775,7 @@ static int kfd_ioctl_pc_sample(struct file *filep, pr_debug("failed to bind process %p with gpu id 0x%x", p, args->gpu_id); ret = -ESRCH; } else { - ret = kfd_pc_sample(pdd, args); + ret = kfd_pc_sample(p, pdd, args); } } mutex_unlock(>mutex); @@ -2808,26 +2808,9 @@ static int runtime_enable(struct kfd_process *p, uint64_t r_debug, p->runtime_info.runtime_state = DEBUG_RUNTIME_STATE_ENABLED; p->runtime_info.r_debug = r_debug; - p->runtime_info.ttmp_setup = enable_ttmp_setup; - if (p->runtime_info.ttmp_setup) { - for (i = 0; i < p->n_pdds; i++) { - struct kfd_process_device *pdd = p->pdds[i]; - - if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) { - amdgpu_gfx_off_ctrl(pdd->dev->adev, false); - pdd->dev->kfd2kgd->enable_debug_trap( - pdd->dev->adev, - true, - pdd->dev->vm_info.last_vmid_kfd); - } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) { - pdd->spi_dbg_override = pdd->dev->kfd2kgd->enable_debug_trap( - pdd->dev->adev, - false, - 0); - } - } - } + if (enable_ttmp_setup) + kfd_dbg_enable_ttmp_setup(p); retry: if (p->debug_trap_enabled) { @@ -2976,9 +2959,13 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, v goto out; } - /* Check if target is still PTRACED. */ rcu_read_lock(); - if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE + + if (kfd_pc_sampling_enabled(target)) { + pr_debug("Cannot enable debug trap on PID:%d because PC Sampling active\n", args->pid); + r = -EBUSY; + /* Check if target is still PTRACED. */ + } else if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE && ptrace_parent(target->lead_thread) != current) { pr_err("PID %i is not PTRACED and cannot be debugged\n", args->pid); r = -EPERM; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index 9ec750666382..092c2dc84d24 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -1118,3 +1118,25 @@ void kfd_dbg_set_enabled_debug_exception_mask(struct kfd_process *target, mutex_unlock(>event_mutex); } + +void kfd_dbg_enable_ttmp_setup(struct kfd_process *p) +{ + int i; + p->runtime_info.ttmp_setup = true; + for (i = 0; i < p->n_pdds; i++) { + struct kfd_process_device *pdd = p->pdds[i]; + + if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) { + amdgpu_gfx_off_ctrl(pdd->dev->adev, false); + pdd->dev->kfd2kgd->enable_debug_trap( + pdd->dev->adev, + true, + pdd->dev->vm_info.last_vmid_kfd); + } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) { + pdd->spi_dbg_override = pdd->dev->kfd2kgd->enable
[PATCH v3 23/24] drm/amdkfd: set debug trap bit when enabling PC Sampling
From: David Yat Sin We need the SPI_GDBG_PER_VMID_CNTL.TRAP_EN bit to be set during PC Sampling so that the TTMP registers are valid inside the sampling data. runtime_info.ttmp_setup will be cleared when the user application does the AMDKFD_IOC_RUNTIME_ENABLE ioctl without KFD_RUNTIME_ENABLE_MODE_ENABLE_MASK flag on exit. It is also not valid to have the debugger attached to a process while PC sampling is enabled so adding some checks to prevent this. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 31 -- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 22 ++ drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 43 +--- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 4 +- 5 files changed, 75 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 1a3a8ded9c93..f7a8794c2bde 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1775,7 +1775,7 @@ static int kfd_ioctl_pc_sample(struct file *filep, pr_debug("failed to bind process %p with gpu id 0x%x", p, args->gpu_id); ret = -ESRCH; } else { - ret = kfd_pc_sample(pdd, args); + ret = kfd_pc_sample(p, pdd, args); } } mutex_unlock(>mutex); @@ -2808,26 +2808,9 @@ static int runtime_enable(struct kfd_process *p, uint64_t r_debug, p->runtime_info.runtime_state = DEBUG_RUNTIME_STATE_ENABLED; p->runtime_info.r_debug = r_debug; - p->runtime_info.ttmp_setup = enable_ttmp_setup; - if (p->runtime_info.ttmp_setup) { - for (i = 0; i < p->n_pdds; i++) { - struct kfd_process_device *pdd = p->pdds[i]; - - if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) { - amdgpu_gfx_off_ctrl(pdd->dev->adev, false); - pdd->dev->kfd2kgd->enable_debug_trap( - pdd->dev->adev, - true, - pdd->dev->vm_info.last_vmid_kfd); - } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) { - pdd->spi_dbg_override = pdd->dev->kfd2kgd->enable_debug_trap( - pdd->dev->adev, - false, - 0); - } - } - } + if (enable_ttmp_setup) + kfd_dbg_enable_ttmp_setup(p); retry: if (p->debug_trap_enabled) { @@ -2976,9 +2959,13 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, v goto out; } - /* Check if target is still PTRACED. */ rcu_read_lock(); - if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE + + if (kfd_pc_sampling_enabled(target)) { + pr_debug("Cannot enable debug trap on PID:%d because PC Sampling active\n", args->pid); + r = -EBUSY; + /* Check if target is still PTRACED. */ + } else if (target != p && args->op != KFD_IOC_DBG_TRAP_DISABLE && ptrace_parent(target->lead_thread) != current) { pr_err("PID %i is not PTRACED and cannot be debugged\n", args->pid); r = -EPERM; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index 9ec750666382..092c2dc84d24 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -1118,3 +1118,25 @@ void kfd_dbg_set_enabled_debug_exception_mask(struct kfd_process *target, mutex_unlock(>event_mutex); } + +void kfd_dbg_enable_ttmp_setup(struct kfd_process *p) +{ + int i; + p->runtime_info.ttmp_setup = true; + for (i = 0; i < p->n_pdds; i++) { + struct kfd_process_device *pdd = p->pdds[i]; + + if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) { + amdgpu_gfx_off_ctrl(pdd->dev->adev, false); + pdd->dev->kfd2kgd->enable_debug_trap( + pdd->dev->adev, + true, + pdd->dev->vm_info.last_vmid_kfd); + } else if (kfd_dbg_is_per_vmid_supported(pdd->dev)) { + pdd->spi_dbg_override = pdd->dev->kfd2kgd->enable_debug_trap( + pdd->dev->adev, + false, + 0); + } + } +} \ No newline at end of file diff --git
[PATCH v3 17/24] drm/amdkfd: add setting trap pc sampling flag
Add setting trap pc sampling flag. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 + 2 files changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 7ca7cc726246..b9a36891d099 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1198,6 +1198,8 @@ void kfd_process_set_trap_handler(struct qcm_process_device *qpd, uint64_t tma_addr); void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, bool enabled); +void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd, +enum kfd_ioctl_pc_sample_method method, bool enabled); /* CWSR initialization */ int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file *filep); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 1a31b556a5ff..6bc9dcfad484 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1460,6 +1460,19 @@ void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, } } +void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd, +enum kfd_ioctl_pc_sample_method method, bool enabled) +{ + if (qpd->cwsr_kaddr) { + volatile unsigned long *tma = + (volatile unsigned long *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); + if (enabled) + set_bit(method, [2]); + else + clear_bit(method, [2]); + } +} + /* * On return the kfd_process is fully operational and will be freed when the * mm is released -- 2.25.1
[PATCH v3 18/24] drm/amdkfd: enable pc sampling stop
Enable pc sampling stop. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 28 +--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 4 +++ 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 07e4c4a32e7b..02fa481d7457 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -88,10 +88,32 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd) return -EINVAL; } -static int kfd_pc_sample_stop(struct kfd_process_device *pdd) +static int kfd_pc_sample_stop(struct kfd_process_device *pdd, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + bool pc_sampling_stop = false; + + pcs_entry->enabled = false; + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.active_count--; + if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) { + WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, true); + pc_sampling_stop = true; + } + mutex_unlock(>dev->pcs_data.mutex); + if (pc_sampling_stop) { + kfd_process_set_trap_pc_sampling_flag(>qpd, + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false); + + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0; + pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0; + WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, false); + mutex_unlock(>dev->pcs_data.mutex); + } + + return 0; } static int kfd_pc_sample_create(struct kfd_process_device *pdd, @@ -233,7 +255,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (!pcs_entry->enabled) return -EALREADY; else - return kfd_pc_sample_stop(pdd); + return kfd_pc_sample_stop(pdd, pcs_entry); } return -EINVAL; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index b9a36891d099..0839a0ca3099 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -271,6 +271,10 @@ struct kfd_dev; struct kfd_dev_pc_sampling_data { uint32_t use_count; /* Num of PC sampling sessions */ + uint32_t active_count; /* Num of active sessions */ + uint32_t target_simd; /* target simd for trap */ + uint32_t target_wave_slot; /* target wave slot for trap */ + bool stop_enable; /* pc sampling stop in process */ struct idr pc_sampling_idr; struct kfd_pc_sample_info pc_sample_info; }; -- 2.25.1
[PATCH v3 22/24] drm/amdkfd: add pc sampling release when process release
Add pc sampling release when process release, it will force to stop all activate sessions with this process. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 21 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 1 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 +++ 3 files changed, 25 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index c95d9ff08f6a..d8286aabd5a7 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -300,6 +300,27 @@ static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_ return 0; } +void kfd_pc_sample_release(struct kfd_process_device *pdd) +{ + struct pc_sampling_entry *pcs_entry; + struct idr *idp; + uint32_t id; + + /* force to release all PC sampling task for this process */ + idp = >dev->pcs_data.hosttrap_entry.base.pc_sampling_idr; + mutex_lock(>dev->pcs_data.mutex); + idr_for_each_entry(idp, pcs_entry, id) { + if (pcs_entry->pdd != pdd) + continue; + mutex_unlock(>dev->pcs_data.mutex); + if (pcs_entry->enabled) + kfd_pc_sample_stop(pdd, pcs_entry); + kfd_pc_sample_destroy(pdd, id, pcs_entry); + mutex_lock(>dev->pcs_data.mutex); + } + mutex_unlock(>dev->pcs_data.mutex); +} + int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args) { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h index 4eeded4ea5b6..6175563ca9be 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h @@ -30,5 +30,6 @@ int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args); +void kfd_pc_sample_release(struct kfd_process_device *pdd); #endif /* KFD_PC_SAMPLING_H_ */ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 6bc9dcfad484..1f8d6098dfb2 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -43,6 +43,7 @@ struct mm_struct; #include "kfd_svm.h" #include "kfd_smi_events.h" #include "kfd_debug.h" +#include "kfd_pc_sampling.h" /* * List of struct kfd_process (field kfd_process). @@ -1021,6 +1022,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p) pr_debug("Releasing pdd (topology id %d) for process (pasid 0x%x)\n", pdd->dev->id, p->pasid); + kfd_pc_sample_release(pdd); + kfd_process_device_destroy_cwsr_dgpu(pdd); kfd_process_device_destroy_ib_mem(pdd); -- 2.25.1
[PATCH v3 11/24] drm/amdkfd/gfx9: enable host trap
Enable host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++ .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 --- 2 files changed, 52 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index df75863393fc..747426bd5181 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -274,14 +274,14 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { static const uint32_t cwsr_trap_gfx9_hex[] = { - 0xbf820001, 0xbf820258, + 0xbf820001, 0xbf82025e, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -294,7 +294,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -303,13 +303,16 @@ static const uint32_t cwsr_trap_gfx9_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f, + 0xc0031c3d, 0x0010, + 0xc0071bbd, 0x, 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x86ee6e6e, + 0xbf8cc07f, 0x8671ff6d, + 0x0100, 0xbf840004, + 0x92f1ff70, 0x00010001, + 0xbf840016, 0xbf820005, + 0x86708170, 0x8e709770, + 0x8977ff77, 0x0080, + 0x8077, 0x86ee6e6e, 0xbf840001, 0xbe801d6e, 0x866eff6d, 0x01ff, 0xbf850005, 0x8778ff78, @@ -1098,14 +1101,14 @@ static const uint32_t cwsr_trap_nv1x_hex[] = { }; static const uint32_t cwsr_trap_arcturus_hex[] = { - 0xbf820001, 0xbf8202d4, + 0xbf820001, 0xbf8202da, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -1118,7 +1121,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -1127,13 +1130,16 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f, + 0xc0031c3d, 0x0010, + 0xc0071bbd, 0x, 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x86ee6e6e, + 0xbf8cc07f, 0x8671ff6d, + 0x0100, 0xbf840004, + 0x92f1ff70, 0x00010001, + 0xbf840016, 0xbf820005, + 0x86708170, 0x8e709770, + 0x8977ff77, 0x0080, + 0x8077, 0x86ee6e6e, 0xbf840001, 0xbe801d6e, 0x866eff6d, 0x01ff, 0xbf850005, 0x8778ff78, @@ -1578,14 +1584,14 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { }; static const uint32_t cwsr_trap_aldebaran_hex[] = { - 0xbf820001, 0xbf8202df, + 0xbf820001, 0xbf8202e5, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -1598,7 +1604,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -1607,13 +1613,16 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f
[PATCH v3 13/24] drm/amdgpu: add sq host trap status check
Before fire a new host trap, check the host trap status. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 35 +++ .../amd/include/asic_reg/gc/gc_9_0_offset.h | 2 ++ .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h | 5 +++ 3 files changed, 42 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index adfe5e5585e5..43edd62df5fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1144,6 +1144,35 @@ void kgd_gfx_v9_program_trap_handler_settings(struct amdgpu_device *adev, kgd_gfx_v9_unlock_srbm(adev, inst); } +static uint32_t kgd_aldebaran_get_hosttrap_status(struct amdgpu_device *adev) +{ + uint32_t sq_hosttrap_status = 0x0; + int i, j; + + mutex_lock(>grbm_idx_mutex); + for (i = 0; i < adev->gfx.config.max_shader_engines; i++) { + for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) { + amdgpu_gfx_select_se_sh(adev, i, j, 0x, 0); + sq_hosttrap_status = RREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS); + + if (sq_hosttrap_status & SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK) { + WREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS, + SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK); + sq_hosttrap_status = 0x0; + continue; + } + if (sq_hosttrap_status) + goto out; + } + } + +out: + amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); + mutex_unlock(>grbm_idx_mutex); + + return sq_hosttrap_status; +} + uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, uint32_t vmid, uint32_t max_wave_slot, @@ -1154,6 +1183,12 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, { if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) { uint32_t value = 0; + uint32_t sq_hosttrap_status = 0x0; + + sq_hosttrap_status = kgd_aldebaran_get_hosttrap_status(adev); + /* skip when last host trap request is still pending to complete */ + if (sq_hosttrap_status) + return 0; value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP); value = REG_SET_FIELD(value, SQ_CMD, MODE, SQ_IND_CMD_MODE_SINGLE); diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h index 12d451e5475b..5b17d9066452 100644 --- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h +++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h @@ -462,6 +462,8 @@ #define mmSQ_IND_DATA_BASE_IDX 0 #define mmSQ_CMD 0x037b #define mmSQ_CMD_BASE_IDX 0 +#define mmSQ_HOSTTRAP_STATUS 0x0376 +#define mmSQ_HOSTTRAP_STATUS_BASE_IDX 0 #define mmSQ_TIME_HI 0x037c #define mmSQ_TIME_HI_BASE_IDX 0 #define mmSQ_TIME_LO 0x037d diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h index efc16ddf274a..3dfe4ab31421 100644 --- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h +++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h @@ -2616,6 +2616,11 @@ //SQ_CMD_TIMESTAMP #define SQ_CMD_TIMESTAMP__TIMESTAMP__SHIFT 0x0 #define SQ_CMD_TIMESTAMP__TIMESTAMP_MASK 0x00FFL +//SQ_HOSTTRAP_STATUS +#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT__SHIFT 0x0 +#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE__SHIFT 0x8 +#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT_MASK 0x00FFL +#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK
[PATCH v3 24/24] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability
Bump the minor version to declare pc sampling feature is now available. Signed-off-by: James Zhu --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 1bd1347effea..62d8642d3d1c 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -40,9 +40,10 @@ * - 1.12 - Add DMA buf export ioctl * - 1.13 - Add debugger API * - 1.14 - Update kfd_event_data + * - 1.15 - Add PC Sampling ioctl */ #define KFD_IOCTL_MAJOR_VERSION 1 -#define KFD_IOCTL_MINOR_VERSION 14 +#define KFD_IOCTL_MINOR_VERSION 15 struct kfd_ioctl_get_version_args { __u32 major_version;/* from KFD */ -- 2.25.1
[PATCH v3 12/24] drm/amdgpu: use trapID 4 for host trap
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify the host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 + .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 + .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |5 + 3 files changed, 1070 insertions(+), 1054 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index 7d8c0e13ac12..adfe5e5585e5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1162,6 +1162,8 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd); /* select *target_wave_slot */ value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, (*target_wave_slot)++); + /* set TrapID 4 for HOSTTRAP */ + value = REG_SET_FIELD(value, SQ_CMD, DATA, 0x4); mutex_lock(>grbm_idx_mutex); amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index 747426bd5181..44955838f307 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -274,155 +274,263 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { static const uint32_t cwsr_trap_gfx9_hex[] = { - 0xbf820001, 0xbf82025e, + 0xbf820001, 0xbf820263, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, - 0x00ff, 0xbf85001e, + 0x00ff, 0xbf850023, 0x866eff7b, 0x0400, - 0xbf85005b, 0xbf8e0010, + 0xbf850060, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, - 0xbf850015, 0x866eff7b, - 0x71ff, 0xbf840008, - 0x866fff7b, 0x7080, - 0xbf840001, 0xbeee1a87, - 0xb8eff801, 0x8e6e8c6e, - 0x866e6f6e, 0xbf85000a, - 0x866eff6d, 0x00ff, - 0xbf850007, 0xb8eef801, - 0x866eff6e, 0x0800, - 0xbf850003, 0x866eff7b, - 0x0400, 0xbf850040, - 0xb8faf807, 0x867aff7a, - 0x001f8000, 0x8e7a8b7a, - 0x8977ff77, 0xfc00, - 0x8a77, 0xba7ff807, - 0x, 0xb8faf812, - 0xb8fbf813, 0x8efa887a, - 0xbf0d8f7b, 0xbf840002, - 0x877bff7b, 0x, - 0xc0031c3d, 0x0010, - 0xc0071bbd, 0x, - 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x8671ff6d, - 0x0100, 0xbf840004, - 0x92f1ff70, 0x00010001, - 0xbf840016, 0xbf820005, - 0x86708170, 0x8e709770, - 0x8977ff77, 0x0080, - 0x8077, 0x86ee6e6e, - 0xbf840001, 0xbe801d6e, - 0x866eff6d, 0x01ff, - 0xbf850005, 0x8778ff78, - 0x2000, 0x80ec886c, - 0x82ed806d, 0xbf820005, - 0x866eff6d, 0x0100, - 0xbf850002, 0x806c846c, - 0x826d806d, 0x866dff6d, - 0x, 0x8f7a8b77, + 0xbf85001a, 0x866eff6d, + 0x01ff, 0xbf06ff6e, + 0x0104, 0xbf850015, + 0x866eff7b, 0x71ff, + 0xbf840008, 0x866fff7b, + 0x7080, 0xbf840001, + 0xbeee1a87, 0xb8eff801, + 0x8e6e8c6e, 0x866e6f6e, + 0xbf85000a, 0x866eff6d, + 0x00ff, 0xbf850007, + 0xb8eef801, 0x866eff6e, + 0x0800, 0xbf850003, + 0x866eff7b, 0x0400, + 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, - 0xb97af807, 0x86fe7e7e, - 0x86ea6a6a, 0x8f6e8378, - 0xb96ee0c2, 0xbf82, - 0xb9780002, 0xbe801f6c, + 0x8e7a8b7a, 0x8977ff77, + 0xfc00, 0x8a77, + 0xba7ff807, 0x, + 0xb8faf812, 0xb8fbf813, + 0x8efa887a, 0xbf0d8f7b, + 0xbf840002, 0x877bff7b, + 0x, 0xc0031c3d, + 0x0010, 0xc0071bbd, + 0x, 0xc0071ebd, + 0x0008, 0xbf8cc07f, + 0x8671ff6d, 0x0100, + 0xbf840004, 0x92f1ff70, + 0x00010001, 0xbf840016, + 0xbf820005, 0x86708170, + 0x8e709770, 0x8977ff77, + 0x0080, 0x8077, + 0x86ee6e6e, 0xbf840001, + 0xbe801d6e, 0x866eff6d, + 0x01ff, 0xbf850005, + 0x8778ff78, 0x2000, + 0x80ec886c, 0x82ed806d, + 0xbf820005, 0x866eff6d, + 0x0100, 0xbf850002, + 0x806c846c, 0x826d806d, 0x866dff6d, 0x, - 0xbefa0080, 0xb97a0283, - 0xb8faf807, 0x867aff7a, - 0x001f8000, 0x8e7a8b7a, - 0x8977ff77, 0xfc00, - 0x8a77, 0xba7ff807, - 0x, 0xbeee007e, - 0xbeef007f, 0xbefe0180, - 0xbf94, 0x877a8478, - 0xb97af802, 0xbf8e0002, - 0xbf88fffe, 0xb8fa2a05, - 0x807a817a, 0x8e7a8
[PATCH v3 04/24] drm/amdkfd: add pc sampling mutex
Add pc sampling mutex per node, and do init/destroy in node init. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +++ 2 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0a9cf9dfc224..0e24e011f66b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -533,6 +533,16 @@ static void kfd_smi_init(struct kfd_node *dev) spin_lock_init(>smi_lock); } +static void kfd_pc_sampling_init(struct kfd_node *dev) +{ + mutex_init(>pcs_data.mutex); +} + +static void kfd_pc_sampling_exit(struct kfd_node *dev) +{ + mutex_destroy(>pcs_data.mutex); +} + static int kfd_init_node(struct kfd_node *node) { int err = -1; @@ -563,6 +573,7 @@ static int kfd_init_node(struct kfd_node *node) } kfd_smi_init(node); + kfd_pc_sampling_init(node); return 0; @@ -593,6 +604,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes) kfd_topology_remove_device(knode); if (knode->gws) amdgpu_amdkfd_free_gws(knode->adev, knode->gws); + kfd_pc_sampling_exit(knode); kfree(knode); kfd->nodes[i] = NULL; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 99426182bfc6..cbaa1bccd94b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -269,6 +269,11 @@ struct kfd_vmid_info { struct kfd_dev; +/* Per device PC Sampling data */ +struct kfd_dev_pc_sampling { + struct mutex mutex; +}; + struct kfd_node { unsigned int node_id; struct amdgpu_device *adev; /* Duplicated here along with keeping @@ -322,6 +327,8 @@ struct kfd_node { struct kfd_local_mem_info local_mem_info; struct kfd_dev *kfd; + + struct kfd_dev_pc_sampling pcs_data; }; struct kfd_dev { -- 2.25.1
[PATCH v3 21/24] drm/amdkfd: add pc sampling thread to trigger trap
Add a kthread to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 68 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 1 + 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 42282f130fc3..c95d9ff08f6a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -39,6 +39,66 @@ struct supported_pc_sample_info supported_formats[] = { { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 }, }; +static int kfd_pc_sample_thread(void *param) +{ + struct amdgpu_device *adev; + struct kfd_node *node = param; + uint32_t timeout = 0; + + mutex_lock(>pcs_data.mutex); + if (node->pcs_data.hosttrap_entry.base.active_count && + node->pcs_data.hosttrap_entry.base.pc_sample_info.interval && + node->kfd2kgd->trigger_pc_sample_trap) { + switch (node->pcs_data.hosttrap_entry.base.pc_sample_info.type) { + case KFD_IOCTL_PCS_TYPE_TIME_US: + timeout = (uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.interval; + break; + default: + pr_debug("PC Sampling type %d not supported.", + node->pcs_data.hosttrap_entry.base.pc_sample_info.type); + } + } + mutex_unlock(>pcs_data.mutex); + if (!timeout) + return -EINVAL; + + adev = node->adev; + + allow_signal(SIGKILL); + while (!kthread_should_stop() || + !READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable)) { + node->kfd2kgd->trigger_pc_sample_trap(adev, node->vm_info.last_vmid_kfd, + >pcs_data.hosttrap_entry.base.target_simd, + >pcs_data.hosttrap_entry.base.target_wave_slot, + node->pcs_data.hosttrap_entry.base.pc_sample_info.method); + pr_debug_ratelimited("triggered a host trap."); + + if (signal_pending(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) + break; + usleep_range(timeout, timeout + 10); + } + node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL; + + return 0; +} + +static int kfd_pc_sample_thread_start(struct kfd_node *node) +{ + char thread_name[16]; + int ret = 0; + + snprintf(thread_name, 16, "pcs_%08x", node->adev->ddev.render->index); + node->pcs_data.hosttrap_entry.base.pc_sample_thread = + kthread_run(kfd_pc_sample_thread, node, thread_name); + if (IS_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) { + ret = PTR_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread); + node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL; + pr_debug("Failed to create pc sample thread for %s.\n", thread_name); + } + + return ret; +} + static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { @@ -88,6 +148,7 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd, struct pc_sampling_entry *pcs_entry) { bool pc_sampling_start = false; + int ret = 0; pcs_entry->enabled = true; mutex_lock(>dev->pcs_data.mutex); @@ -102,11 +163,13 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd, } else { kfd_process_set_trap_pc_sampling_flag(>qpd, pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true); + if (!pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread) + ret = kfd_pc_sample_thread_start(pdd->dev); break; } } - return 0; + return ret; } static int kfd_pc_sample_stop(struct kfd_process_device *pdd, @@ -124,6 +187,9 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd, mutex_unlock(>dev->pcs_data.mutex); if (pc_sampling_stop) { + kthread_stop(pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread); + while (pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread) + usleep_range(1000, 2000); kfd_process_set_trap_pc_sampling_flag(>qpd, pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false); remap_queue(pd
[PATCH v3 20/24] drm/amdkfd: enable pc sampling start
Enable pc sampling start. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index c9fd5b2a3330..42282f130fc3 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -84,9 +84,29 @@ static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, return 0; } -static int kfd_pc_sample_start(struct kfd_process_device *pdd) +static int kfd_pc_sample_start(struct kfd_process_device *pdd, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + bool pc_sampling_start = false; + + pcs_entry->enabled = true; + mutex_lock(>dev->pcs_data.mutex); + if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) + pc_sampling_start = true; + pdd->dev->pcs_data.hosttrap_entry.base.active_count++; + mutex_unlock(>dev->pcs_data.mutex); + + while (pc_sampling_start) { + if (READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) { + usleep_range(1000, 2000); + } else { + kfd_process_set_trap_pc_sampling_flag(>qpd, + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true); + break; + } + } + + return 0; } static int kfd_pc_sample_stop(struct kfd_process_device *pdd, @@ -252,7 +272,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (pcs_entry->enabled) return -EALREADY; else - return kfd_pc_sample_start(pdd); + return kfd_pc_sample_start(pdd, pcs_entry); case KFD_IOCTL_PCS_OP_STOP: if (!pcs_entry->enabled) -- 2.25.1
[PATCH v3 15/24] drm/amdkfd: trigger pc sampling trap for aldebaran
Implement trigger pc sampling trap for aldebaran. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index aff08321e976..27eda75ceecb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,6 +163,16 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } +static uint32_t kgd_aldebaran_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 8, 4, + target_simd, target_wave_slot, method); +} + const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -191,4 +201,5 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings, + .trigger_pc_sample_trap = kgd_aldebaran_trigger_pc_sample_trap, }; -- 2.25.1
[PATCH v3 19/24] drm/amdkfd: add queue remapping
Add queue remapping to ensure that any waves executing the PC sampling part of the trap handler are done before kfd_pc_sample_stop returns, and that no new waves enter that part of the trap handler afterwards. This avoids race conditions that could lead to use-after-free. Unmapping and remapping the queues either waits for the waves to drain, or preempts them with CWSR, which itself executes a trap and waits for previous traps to finish. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++ drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 5 + drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 3 +++ 3 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index c0e71543389a..a3f57be63f4f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager *dqm) return debug_map_and_unlock(dqm); } +void remap_queue(struct device_queue_manager *dqm, + enum kfd_unmap_queues_filter filter, + uint32_t filter_param, + uint32_t grace_period) +{ + dqm_lock(dqm); + if (!dqm->dev->kfd->shared_resources.enable_mes) + execute_queues_cpsch(dqm, filter, filter_param, grace_period); + dqm_unlock(dqm); +} + #if defined(CONFIG_DEBUG_FS) static void seq_reg_dump(struct seq_file *m, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index cf7e182588f8..f8aae3747a36 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm); int debug_map_and_unlock(struct device_queue_manager *dqm); int debug_refresh_runlist(struct device_queue_manager *dqm); +void remap_queue(struct device_queue_manager *dqm, + enum kfd_unmap_queues_filter filter, + uint32_t filter_param, + uint32_t grace_period); + static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd) { return (pdd->lds_base >> 16) & 0xFF; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 02fa481d7457..c9fd5b2a3330 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -24,6 +24,7 @@ #include "kfd_priv.h" #include "amdgpu_amdkfd.h" #include "kfd_pc_sampling.h" +#include "kfd_device_queue_manager.h" struct supported_pc_sample_info { uint32_t ip_version; @@ -105,6 +106,8 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd, if (pc_sampling_stop) { kfd_process_set_trap_pc_sampling_flag(>qpd, pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false); + remap_queue(pdd->dev->dqm, + KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD); mutex_lock(>dev->pcs_data.mutex); pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0; -- 2.25.1
[PATCH v3 16/24] drm/amdkfd: use bit operation set debug trap
1st level TMA's 2nd byte which used for trap type setting, to use bit operation to change selected bit only. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 71df51fcc1b0..1a31b556a5ff 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1440,13 +1440,23 @@ bool kfd_process_xnack_mode(struct kfd_process *p, bool supported) return true; } +/* bit offset in 1st-level TMA's 2nd byte which used for KFD_TRAP_TYPE_BIT */ +enum KFD_TRAP_TYPE_BIT { + KFD_TRAP_TYPE_DEBUG = 0,/* bit 0 for debug trap */ + KFD_TRAP_TYPE_HOST, + KFD_TRAP_TYPE_STOCHASTIC, +}; + void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, bool enabled) { if (qpd->cwsr_kaddr) { - uint64_t *tma = - (uint64_t *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); - tma[2] = enabled; + volatile unsigned long *tma = + (volatile unsigned long *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); + if (enabled) + set_bit(KFD_TRAP_TYPE_DEBUG, [2]); + else + clear_bit(KFD_TRAP_TYPE_DEBUG, [2]); } } -- 2.25.1
[PATCH v3 09/24] drm/amdkfd: add interface to trigger pc sampling trap
Add interface to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h index 6d094cf3587d..05b0255aca37 100644 --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h @@ -33,6 +33,7 @@ #include #include "amdgpu_irq.h" #include "amdgpu_gfx.h" +#include struct pci_dev; struct amdgpu_device; @@ -318,6 +319,11 @@ struct kfd2kgd_calls { void (*program_trap_handler_settings)(struct amdgpu_device *adev, uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr, uint32_t inst); + uint32_t (*trigger_pc_sample_trap)(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method); }; #endif /* KGD_KFD_INTERFACE_H_INCLUDED */ -- 2.25.1
[PATCH v3 10/24] drm/amdkfd: trigger pc sampling trap for gfx v9
Implement trigger pc sampling trap for gfx v9. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 36 +++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 7 2 files changed, 43 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index 5a35a8ca8922..7d8c0e13ac12 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1144,6 +1144,42 @@ void kgd_gfx_v9_program_trap_handler_settings(struct amdgpu_device *adev, kgd_gfx_v9_unlock_srbm(adev, inst); } +uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t max_wave_slot, + uint32_t max_simd, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) { + uint32_t value = 0; + + value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP); + value = REG_SET_FIELD(value, SQ_CMD, MODE, SQ_IND_CMD_MODE_SINGLE); + + /* select *target_simd */ + value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd); + /* select *target_wave_slot */ + value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, (*target_wave_slot)++); + + mutex_lock(>grbm_idx_mutex); + amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); + WREG32_SOC15(GC, 0, mmSQ_CMD, value); + mutex_unlock(>grbm_idx_mutex); + + *target_wave_slot %= max_wave_slot; + if (!(*target_wave_slot)) { + (*target_simd)++; + *target_simd %= max_simd; + } + } else { + pr_debug("PC Sampling method %d not supported.", method); + return -EOPNOTSUPP; + } + return 0; +} + const struct kfd2kgd_calls gfx_v9_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h index ce424615f59b..b47b926891a8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h @@ -101,3 +101,10 @@ void kgd_gfx_v9_build_grace_period_packet_info(struct amdgpu_device *adev, uint32_t grace_period, uint32_t *reg_offset, uint32_t *reg_data); +uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t max_wave_slot, + uint32_t max_simd, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method); -- 2.25.1
[PATCH v3 07/24] drm/amdkfd: check pcs_entry valid
Check pcs_entry valid for pc sampling ioctl. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++-- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 0ea51330acd8..193a8aa94d52 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -178,6 +178,24 @@ static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_ int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args) { + struct pc_sampling_entry *pcs_entry; + + if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES && + args->op != KFD_IOCTL_PCS_OP_CREATE) { + + mutex_lock(>dev->pcs_data.mutex); + pcs_entry = idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + args->trace_id); + mutex_unlock(>dev->pcs_data.mutex); + + /* pcs_entry is only for this pc sampling process, +* which has kfd_process->mutex protected here. +*/ + if (!pcs_entry || + pcs_entry->pdd != pdd) + return -EINVAL; + } + switch (args->op) { case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: return kfd_pc_sample_query_cap(pdd, args); @@ -186,13 +204,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd, return kfd_pc_sample_create(pdd, args); case KFD_IOCTL_PCS_OP_DESTROY: - return kfd_pc_sample_destroy(pdd, args->trace_id); + if (pcs_entry->enabled) + return -EBUSY; + else + return kfd_pc_sample_destroy(pdd, args->trace_id); case KFD_IOCTL_PCS_OP_START: - return kfd_pc_sample_start(pdd); + if (pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_start(pdd); case KFD_IOCTL_PCS_OP_STOP: - return kfd_pc_sample_stop(pdd); + if (!pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_stop(pdd); } return -EINVAL; -- 2.25.1
[PATCH v3 14/24] drm/amdkfd: trigger pc sampling trap for arcturus
Implement trigger pc sampling trap for arcturus. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c| 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c index 0ba15dcbe4e1..10b362e072a6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c @@ -390,6 +390,17 @@ static uint32_t kgd_arcturus_disable_debug_trap(struct amdgpu_device *adev, return 0; } + +static uint32_t kgd_arcturus_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 10, 4, + target_simd, target_wave_slot, method); +} + const struct kfd2kgd_calls arcturus_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -418,5 +429,6 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = { .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy, - .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings + .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings, + .trigger_pc_sample_trap = kgd_arcturus_trigger_pc_sample_trap }; -- 2.25.1
[PATCH v3 08/24] drm/amdkfd: enable pc sampling destroy
Enable pc sampling destroy. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 193a8aa94d52..07e4c4a32e7b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -169,10 +169,24 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, return 0; } -static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id) +static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + pr_debug("free pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", + pcs_entry, trace_id, pdd->dev->id); + + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.use_count--; + idr_remove(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, trace_id); + if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) + memset(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, 0x0, + sizeof(struct kfd_pc_sample_info)); + mutex_unlock(>dev->pcs_data.mutex); + + kvfree(pcs_entry); + + return 0; } int kfd_pc_sample(struct kfd_process_device *pdd, @@ -207,7 +221,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (pcs_entry->enabled) return -EBUSY; else - return kfd_pc_sample_destroy(pdd, args->trace_id); + return kfd_pc_sample_destroy(pdd, args->trace_id, pcs_entry); case KFD_IOCTL_PCS_OP_START: if (pcs_entry->enabled) -- 2.25.1
[PATCH v3 00/24] Support Host Trap Sampling for gfx941/gfx942
PC sampling is a form of software profiling, where the threads of an application are periodically interrupted and the program counter that the threads are currently attempting to execute is saved out for profiling. David Yat Sin (5): drm/amdkfd/kfd_ioctl: add pc sampling support drm/amdkfd: add pc sampling support drm/amdkfd: enable pc sampling query drm/amdkfd: enable pc sampling create drm/amdkfd: set debug trap bit when enabling PC Sampling James Zhu (19): drm/amdkfd: add pc sampling mutex drm/amdkfd: add trace_id return drm/amdkfd: check pcs_entry valid drm/amdkfd: enable pc sampling destroy drm/amdkfd: add interface to trigger pc sampling trap drm/amdkfd: trigger pc sampling trap for gfx v9 drm/amdkfd/gfx9: enable host trap drm/amdgpu: use trapID 4 for host trap drm/amdgpu: add sq host trap status check drm/amdkfd: trigger pc sampling trap for arcturus drm/amdkfd: trigger pc sampling trap for aldebaran drm/amdkfd: use bit operation set debug trap drm/amdkfd: add setting trap pc sampling flag drm/amdkfd: enable pc sampling stop drm/amdkfd: add queue remapping drm/amdkfd: enable pc sampling start drm/amdkfd: add pc sampling thread to trigger trap drm/amdkfd: add pc sampling release when process release drm/amdkfd: bump kfd ioctl minor version for pc sampling availability .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 14 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 73 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 + drivers/gpu/drm/amd/amdkfd/Makefile |3 +- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 + .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 29 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 73 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 22 + drivers/gpu/drm/amd/amdkfd/kfd_debug.h|3 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 14 + .../drm/amd/amdkfd/kfd_device_queue_manager.c | 11 + .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 + drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 405 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 37 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 43 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 32 +- .../amd/include/asic_reg/gc/gc_9_0_offset.h |2 + .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h |5 + .../gpu/drm/amd/include/kgd_kfd_interface.h |6 + include/uapi/linux/kfd_ioctl.h| 60 +- 21 files changed, 1881 insertions(+), 1080 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h -- 2.25.1
[PATCH v3 06/24] drm/amdkfd: add trace_id return
Add trace_id return for new pc sampling creation per device, Use IDR to quickly locate pc_sampling_entry for reference. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 6 ++ 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0e24e011f66b..bcaeedac8fe0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev) static void kfd_pc_sampling_init(struct kfd_node *dev) { mutex_init(>pcs_data.mutex); + idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1); } static void kfd_pc_sampling_exit(struct kfd_node *dev) { + idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr); mutex_destroy(>pcs_data.mutex); } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 106fac0ba1b3..0ea51330acd8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -99,6 +99,7 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, { struct kfd_pc_sample_info *supported_format = NULL; struct kfd_pc_sample_info user_info; + struct pc_sampling_entry *pcs_entry; int ret; int i; @@ -140,7 +141,19 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, return ret ? -EFAULT : -EEXIST; } - /* TODO: add trace_id return */ + pcs_entry = kzalloc(sizeof(*pcs_entry), GFP_KERNEL); + if (!pcs_entry) { + mutex_unlock(>dev->pcs_data.mutex); + return -ENOMEM; + } + + i = idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + pcs_entry, 1, 0, GFP_KERNEL); + if (i < 0) { + mutex_unlock(>dev->pcs_data.mutex); + kfree(pcs_entry); + return i; + } if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = user_info; @@ -148,6 +161,11 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, pdd->dev->pcs_data.hosttrap_entry.base.use_count++; mutex_unlock(>dev->pcs_data.mutex); + pcs_entry->pdd = pdd; + user_args->trace_id = (uint32_t)i; + + pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", pcs_entry, i, pdd->dev->id); + return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index db2d09db8000..7ca7cc726246 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -271,6 +271,7 @@ struct kfd_dev; struct kfd_dev_pc_sampling_data { uint32_t use_count; /* Num of PC sampling sessions */ + struct idr pc_sampling_idr; struct kfd_pc_sample_info pc_sample_info; }; @@ -756,6 +757,11 @@ enum kfd_pdd_bound { */ #define SDMA_ACTIVITY_DIVISOR 100 +struct pc_sampling_entry { + bool enabled; + struct kfd_process_device *pdd; +}; + /* Data that is per-process-per device. */ struct kfd_process_device { /* The device that owns this data. */ -- 2.25.1
[PATCH v3 03/24] drm/amdkfd: enable pc sampling query
From: David Yat Sin Enable pc sampling to query system capability. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index a7e78ff42d07..987c415f8f0f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -25,10 +25,62 @@ #include "amdgpu_amdkfd.h" #include "kfd_pc_sampling.h" +struct supported_pc_sample_info { + uint32_t ip_version; + const struct kfd_pc_sample_info *sample_info; +}; + +const struct kfd_pc_sample_info sample_info_hosttrap_9_0_0 = { + 0, 1, ~0ULL, 0, KFD_IOCTL_PCS_METHOD_HOSTTRAP, KFD_IOCTL_PCS_TYPE_TIME_US }; + +struct supported_pc_sample_info supported_formats[] = { + { IP_VERSION(9, 4, 1), _info_hosttrap_9_0_0 }, + { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 }, +}; + static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { - return -EINVAL; + uint64_t sample_offset; + int num_method = 0; + int i; + + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version) + num_method++; + + if (!num_method) { + pr_debug("PC Sampling not supported on GC_HWIP:0x%x.", + pdd->dev->adev->ip_versions[GC_HWIP][0]); + return -EOPNOTSUPP; + } + + if (!user_args->sample_info_ptr || !user_args->num_sample_info) { + user_args->num_sample_info = num_method; + return 0; + } + + if (user_args->num_sample_info < num_method) { + user_args->num_sample_info = num_method; + pr_debug("Sample info buffer is not large enough, " +"ASIC requires space for %d kfd_pc_sample_info entries.", num_method); + return -ENOSPC; + } + + sample_offset = user_args->sample_info_ptr; + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) { + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version) { + int ret = copy_to_user((void __user *) sample_offset, + supported_formats[i].sample_info, sizeof(struct kfd_pc_sample_info)); + if (ret) { + pr_debug("Failed to copy PC sampling info to user."); + return -EFAULT; + } + sample_offset += sizeof(struct kfd_pc_sample_info); + } + } + + return 0; } static int kfd_pc_sample_start(struct kfd_process_device *pdd) -- 2.25.1
[PATCH v3 02/24] drm/amdkfd: add pc sampling support
From: David Yat Sin Add pc sampling functions in amdkfd. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/Makefile | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 +++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 78 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 34 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 13 5 files changed, 171 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile index a5ae7bcf44eb..790fd028a681 100644 --- a/drivers/gpu/drm/amd/amdkfd/Makefile +++ b/drivers/gpu/drm/amd/amdkfd/Makefile @@ -57,7 +57,8 @@ AMDKFD_FILES := $(AMDKFD_PATH)/kfd_module.o \ $(AMDKFD_PATH)/kfd_int_process_v11.o \ $(AMDKFD_PATH)/kfd_smi_events.o \ $(AMDKFD_PATH)/kfd_crat.o \ - $(AMDKFD_PATH)/kfd_debug.o + $(AMDKFD_PATH)/kfd_debug.o \ + $(AMDKFD_PATH)/kfd_pc_sampling.o ifneq ($(CONFIG_DEBUG_FS),) AMDKFD_FILES += $(AMDKFD_PATH)/kfd_debugfs.o diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index f6d4748c1980..1a3a8ded9c93 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -41,6 +41,7 @@ #include "kfd_priv.h" #include "kfd_device_queue_manager.h" #include "kfd_svm.h" +#include "kfd_pc_sampling.h" #include "amdgpu_amdkfd.h" #include "kfd_smi_events.h" #include "amdgpu_dma_buf.h" @@ -1750,6 +1751,38 @@ static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data) } #endif +static int kfd_ioctl_pc_sample(struct file *filep, + struct kfd_process *p, void __user *data) +{ + struct kfd_ioctl_pc_sample_args *args = data; + struct kfd_process_device *pdd; + int ret; + + if (sched_policy == KFD_SCHED_POLICY_NO_HWS) { + pr_err("PC Sampling does not support sched_policy %i", sched_policy); + return -EINVAL; + } + + mutex_lock(>mutex); + pdd = kfd_process_device_data_by_id(p, args->gpu_id); + + if (!pdd) { + pr_debug("could not find gpu id 0x%x.", args->gpu_id); + ret = -EINVAL; + } else { + pdd = kfd_bind_process_to_device(pdd->dev, p); + if (IS_ERR(pdd)) { + pr_debug("failed to bind process %p with gpu id 0x%x", p, args->gpu_id); + ret = -ESRCH; + } else { + ret = kfd_pc_sample(pdd, args); + } + } + mutex_unlock(>mutex); + + return ret; +} + static int criu_checkpoint_process(struct kfd_process *p, uint8_t __user *user_priv_data, uint64_t *priv_offset) @@ -3224,6 +3257,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = { AMDKFD_IOCTL_DEF(AMDKFD_IOC_DBG_TRAP, kfd_ioctl_set_debug_trap, 0), + + AMDKFD_IOCTL_DEF(AMDKFD_IOC_PC_SAMPLE, + kfd_ioctl_pc_sample, KFD_IOC_FLAG_PERFMON), }; #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls) @@ -3300,6 +3336,14 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) } } + /* PC Sampling Monitor */ + if (unlikely(ioctl->flags & KFD_IOC_FLAG_PERFMON)) { + if (!capable(CAP_PERFMON) && !capable(CAP_SYS_ADMIN)) { + retcode = -EACCES; + goto err_i1; + } + } + if (cmd & (IOC_IN | IOC_OUT)) { if (asize <= sizeof(stack_kdata)) { kdata = stack_kdata; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c new file mode 100644 index ..a7e78ff42d07 --- /dev/null +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -0,0 +1,78 @@ +// SPDX-License-Identifier: GPL-2.0 OR MIT +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission
[PATCH v3 05/24] drm/amdkfd: enable pc sampling create
From: David Yat Sin Enable pc sampling create. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 53 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 987c415f8f0f..106fac0ba1b3 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -97,7 +97,58 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd) static int kfd_pc_sample_create(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { - return -EINVAL; + struct kfd_pc_sample_info *supported_format = NULL; + struct kfd_pc_sample_info user_info; + int ret; + int i; + + if (user_args->num_sample_info != 1) + return -EINVAL; + + ret = copy_from_user(_info, (void __user *) user_args->sample_info_ptr, + sizeof(struct kfd_pc_sample_info)); + if (ret) { + pr_debug("Failed to copy PC sampling info from user\n"); + return -EFAULT; + } + + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) { + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version + && user_info.method == supported_formats[i].sample_info->method + && user_info.type == supported_formats[i].sample_info->type + && user_info.interval <= supported_formats[i].sample_info->interval_max + && user_info.interval >= supported_formats[i].sample_info->interval_min) { + supported_format = + (struct kfd_pc_sample_info *)supported_formats[i].sample_info; + break; + } + } + + if (!supported_format) { + pr_debug("Sampling format is not supported!"); + return -EOPNOTSUPP; + } + + mutex_lock(>dev->pcs_data.mutex); + if (pdd->dev->pcs_data.hosttrap_entry.base.use_count && + memcmp(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, + _info, sizeof(user_info))) { + ret = copy_to_user((void __user *) user_args->sample_info_ptr, + >dev->pcs_data.hosttrap_entry.base.pc_sample_info, + sizeof(struct kfd_pc_sample_info)); + mutex_unlock(>dev->pcs_data.mutex); + return ret ? -EFAULT : -EEXIST; + } + + /* TODO: add trace_id return */ + + if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = user_info; + + pdd->dev->pcs_data.hosttrap_entry.base.use_count++; + mutex_unlock(>dev->pcs_data.mutex); + + return 0; } static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index cbaa1bccd94b..db2d09db8000 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -269,9 +269,19 @@ struct kfd_vmid_info { struct kfd_dev; +struct kfd_dev_pc_sampling_data { + uint32_t use_count; /* Num of PC sampling sessions */ + struct kfd_pc_sample_info pc_sample_info; +}; + +struct kfd_dev_pcs_hosttrap { + struct kfd_dev_pc_sampling_data base; +}; + /* Per device PC Sampling data */ struct kfd_dev_pc_sampling { struct mutex mutex; + struct kfd_dev_pcs_hosttrap hosttrap_entry; }; struct kfd_node { -- 2.25.1
[PATCH v3 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support
From: David Yat Sin Add pc sampling support in kfd_ioctl. The user mode code which uses this new kfd_ioctl is linked to https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface with master branch. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- include/uapi/linux/kfd_ioctl.h | 57 +- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index f0ed68974c54..1bd1347effea 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1446,6 +1446,58 @@ struct kfd_ioctl_dbg_trap_args { }; }; +/** + * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations + * + * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities + * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a per-device PC sampler instance + * @KFD_IOCTL_PCS_OP_DESTROY:Unregister from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_START: Process begins taking samples from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_STOP: Process stops taking samples from a previously registered PC sampler instance + */ +enum kfd_ioctl_pc_sample_op { + KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES, + KFD_IOCTL_PCS_OP_CREATE, + KFD_IOCTL_PCS_OP_DESTROY, + KFD_IOCTL_PCS_OP_START, + KFD_IOCTL_PCS_OP_STOP, +}; + +/* Values have to be a power of 2*/ +#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001 + +enum kfd_ioctl_pc_sample_method { + KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1, + KFD_IOCTL_PCS_METHOD_STOCHASTIC, +}; + +enum kfd_ioctl_pc_sample_type { + KFD_IOCTL_PCS_TYPE_TIME_US, + KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES, + KFD_IOCTL_PCS_TYPE_INSTRUCTIONS +}; + +struct kfd_pc_sample_info { + __u64 interval; /* [IN] if PCS_TYPE_INTERVAL_US: sample interval in us + * if PCS_TYPE_CLOCK_CYCLES: sample interval in graphics core clk cycles + * if PCS_TYPE_INSTRUCTIONS: sample interval in instructions issued by + * graphics compute units + */ + __u64 interval_min; /* [OUT] */ + __u64 interval_max; /* [OUT] */ + __u64 flags; /* [OUT] indicate potential restrictions e.g FLAG_POWER_OF_2 */ + __u32 method;/* [IN/OUT] kfd_ioctl_pc_sample_method */ + __u32 type; /* [IN/OUT] kfd_ioctl_pc_sample_type */ +}; + +struct kfd_ioctl_pc_sample_args { + __u64 sample_info_ptr; /* array of kfd_pc_sample_info */ + __u32 num_sample_info; + __u32 op;/* kfd_ioctl_pc_sample_op */ + __u32 gpu_id; + __u32 trace_id; +}; + #define AMDKFD_IOCTL_BASE 'K' #define AMDKFD_IO(nr) _IO(AMDKFD_IOCTL_BASE, nr) #define AMDKFD_IOR(nr, type) _IOR(AMDKFD_IOCTL_BASE, nr, type) @@ -1566,7 +1618,10 @@ struct kfd_ioctl_dbg_trap_args { #define AMDKFD_IOC_DBG_TRAP\ AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args) +#define AMDKFD_IOC_PC_SAMPLE \ + AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args) + #define AMDKFD_COMMAND_START 0x01 -#define AMDKFD_COMMAND_END 0x27 +#define AMDKFD_COMMAND_END 0x28 #endif -- 2.25.1
Re: [PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout
On 2023-12-13 11:23, Felix Kuehling wrote: On 2023-12-13 10:24, James Zhu wrote: Ping ... On 2023-12-08 18:01, James Zhu wrote: When application tries to allocate all system memory and cause memory to swap out. Needs more time for hmm_range_fault to validate the remaining page for allocation. To be safe, increase timeout value to 1 second for 64MB range. Signed-off-by: James Zhu This is not the first time we're incrementing this timeout. Eventually we should get rid of that and find a way to make this work reliably without a timeout. There can always be situations where faults take longer, and we should not fail randomly in those cases. There are also some FIXMEs in this code that should be addressed at the same time. That said, as a short-term fix, this patch is [JZ] Yes, it is just a short-term fix. the root cause is still under study, Acked-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index 081267161d40..b24eb5821fd1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -190,8 +190,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, pr_debug("hmm range: start = 0x%lx, end = 0x%lx", hmm_range->start, hmm_range->end); - /* Assuming 128MB takes maximum 1 second to fault page address */ - timeout = max((hmm_range->end - hmm_range->start) >> 27, 1UL); + /* Assuming 64MB takes maximum 1 second to fault page address */ + timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL); timeout *= HMM_RANGE_DEFAULT_TIMEOUT; timeout = jiffies + msecs_to_jiffies(timeout);
Re: [PATCH v2 03/23] drm/amdkfd: enable pc sampling query
On 2023-12-12 19:55, Yat Sin, David wrote: [AMD Official Use Only - General] -Original Message- From: Zhu, James Sent: Thursday, December 7, 2023 5:54 PM To:amd-gfx@lists.freedesktop.org Cc: Kuehling, Felix; Greathouse, Joseph ; Yat Sin, David; Zhu, James Subject: [PATCH v2 03/23] drm/amdkfd: enable pc sampling query From: David Yat Sin Enable pc sampling to query system capability. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index a7e78ff42d07..49fecbc7013e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -25,10 +25,62 @@ #include "amdgpu_amdkfd.h" #include "kfd_pc_sampling.h" +struct supported_pc_sample_info { + uint32_t ip_version; + const struct kfd_pc_sample_info *sample_info; }; + +const struct kfd_pc_sample_info sample_info_hosttrap_9_0_0 = { + 0, 1, ~0ULL, 0, KFD_IOCTL_PCS_METHOD_HOSTTRAP, +KFD_IOCTL_PCS_TYPE_TIME_US }; + +struct supported_pc_sample_info supported_formats[] = { + { IP_VERSION(9, 4, 1), _info_hosttrap_9_0_0 }, + { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 }, }; + static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { - return -EINVAL; + uint64_t sample_offset; + int num_method = 0; + int i; + + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version) + num_method++; + + if (!num_method) { + pr_debug("PC Sampling not supported on GC_HWIP:0x%x.", + pdd->dev->adev->ip_versions[GC_HWIP][0]); + return -EOPNOTSUPP; + } + + if (!user_args->sample_info_ptr) { Should be: if (!user_args->sample_info_ptr || !user_args->num_sample_info) { + user_args->num_sample_info = num_method; + return 0; + } + + if (user_args->num_sample_info < num_method) { + user_args->num_sample_info = num_method; + pr_debug("Sample info buffer is not large enough, " + "ASIC requires space for %d kfd_pc_sample_info entries.", num_method); + return -ENOSPC; + } + + sample_offset = user_args->sample_info_ptr; If there is another active PC Sampling session that is active, I thought we were planning to have code to return a reduced list with only the methods that are compatible with the current active session. Did we decide to drop this behavior? [JZ] Do we have design changed here? I though we allow sharing the sameactive PC Sampling session between multiple processes. Regards, David + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) { + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version) { + int ret = copy_to_user((void __user *) sample_offset, + supported_formats[i].sample_info, sizeof(struct kfd_pc_sample_info)); + if (ret) { + pr_debug("Failed to copy PC sampling info to user."); + return -EFAULT; + } + sample_offset += sizeof(struct kfd_pc_sample_info); + } + } + + return 0; } static int kfd_pc_sample_start(struct kfd_process_device *pdd) -- 2.25.1
Re: [PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout
Ping ... On 2023-12-08 18:01, James Zhu wrote: When application tries to allocate all system memory and cause memory to swap out. Needs more time for hmm_range_fault to validate the remaining page for allocation. To be safe, increase timeout value to 1 second for 64MB range. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index 081267161d40..b24eb5821fd1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -190,8 +190,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, pr_debug("hmm range: start = 0x%lx, end = 0x%lx", hmm_range->start, hmm_range->end); - /* Assuming 128MB takes maximum 1 second to fault page address */ - timeout = max((hmm_range->end - hmm_range->start) >> 27, 1UL); + /* Assuming 64MB takes maximum 1 second to fault page address */ + timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL); timeout *= HMM_RANGE_DEFAULT_TIMEOUT; timeout = jiffies + msecs_to_jiffies(timeout);
[PATCH v2 2/2] drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages
Only schedule when hmm_range_fault returns error. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index b24eb5821fd1..55b65fc04b65 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -199,6 +199,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, hmm_range->notifier_seq = mmu_interval_read_begin(notifier); r = hmm_range_fault(hmm_range); if (unlikely(r)) { + schedule(); /* * FIXME: This timeout should encompass the retry from * mmu_interval_read_retry() as well. @@ -212,7 +213,6 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, break; hmm_range->hmm_pfns += MAX_WALK_BYTE >> PAGE_SHIFT; hmm_range->start = hmm_range->end; - schedule(); } while (hmm_range->end < end); hmm_range->start = start; -- 2.25.1
[PATCH v3 00/23] Support Host Trap Sampling for gfx941/gfx942
PC sampling is a form of software profiling, where the threads of an application are periodically interrupted and the program counter that the threads are currently attempting to execute is saved out for profiling. The user mode code which uses this new kfd_ioctl is linked to https://github.com/zhums/ROCT-Thunk-Interface/tree/zhums/ROCT-Thunk. David Yat Sin (4): drm/amdkfd/kfd_ioctl: add pc sampling support drm/amdkfd: add pc sampling support drm/amdkfd: enable pc sampling query drm/amdkfd: enable pc sampling create James Zhu (19): drm/amdkfd: add pc sampling mutex drm/amdkfd: add trace_id return drm/amdkfd: check pcs_enrty valid drm/amdkfd: enable pc sampling destroy drm/amdkfd: add interface to trigger pc sampling trap drm/amdkfd: trigger pc sampling trap for gfx v9 drm/amdkfd/gfx9: enable host trap drm/amdgpu: use trapID 4 for host trap drm/amdgpu: add sq host trap status check drm/amdkfd: trigger pc sampling trap for arcturus drm/amdkfd: trigger pc sampling trap for aldebaran drm/amdkfd: use bit operation set debug trap drm/amdkfd: add setting trap pc sampling flag drm/amdkfd: enable pc sampling stop drm/amdkfd: add queue remapping drm/amdkfd: enable pc sampling start drm/amdkfd: add pc sampling thread to trigger trap drm/amdkfd: add pc sampling release when process release drm/amdkfd: bump kfd ioctl minor version for pc sampling availability .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 14 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 73 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 + drivers/gpu/drm/amd/amdkfd/Makefile |3 +- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 + .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 29 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 14 + .../drm/amd/amdkfd/kfd_device_queue_manager.c | 11 + .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 + drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 372 +++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 35 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 43 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 32 +- .../amd/include/asic_reg/gc/gc_9_0_offset.h |2 + .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h |5 + .../gpu/drm/amd/include/kgd_kfd_interface.h |6 + include/uapi/linux/kfd_ioctl.h| 60 +- 19 files changed, 1813 insertions(+), 1059 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h -- 2.25.1
[PATCH v3 07/23] drm/amdkfd: check pcs_entry valid
Check pcs_entry valid for pc sampling ioctl. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++-- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index b44dfea15539..e5aa87b2da4f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -178,6 +178,24 @@ static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_ int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args) { + struct pc_sampling_entry *pcs_entry; + + if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES && + args->op != KFD_IOCTL_PCS_OP_CREATE) { + + mutex_lock(>dev->pcs_data.mutex); + pcs_entry = idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + args->trace_id); + mutex_unlock(>dev->pcs_data.mutex); + + /* pcs_entry is only for this pc sampling process, +* which has kfd_process->mutex protected here. +*/ + if (!pcs_entry || + pcs_entry->pdd != pdd) + return -EINVAL; + } + switch (args->op) { case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: return kfd_pc_sample_query_cap(pdd, args); @@ -186,13 +204,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd, return kfd_pc_sample_create(pdd, args); case KFD_IOCTL_PCS_OP_DESTROY: - return kfd_pc_sample_destroy(pdd, args->trace_id); + if (pcs_entry->enabled) + return -EBUSY; + else + return kfd_pc_sample_destroy(pdd, args->trace_id); case KFD_IOCTL_PCS_OP_START: - return kfd_pc_sample_start(pdd); + if (pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_start(pdd); case KFD_IOCTL_PCS_OP_STOP: - return kfd_pc_sample_stop(pdd); + if (!pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_stop(pdd); } return -EINVAL; -- 2.25.1
Re: [PATCH v2 00/23] Support Host Trap Sampling for gfx941/gfx942
Ping ... On 2023-12-07 17:53, James Zhu wrote: PC sampling is a form of software profiling, where the threads of an application are periodically interrupted and the program counter that the threads are currently attempting to execute is saved out for profiling. David Yat Sin (4): drm/amdkfd/kfd_ioctl: add pc sampling support drm/amdkfd: add pc sampling support drm/amdkfd: enable pc sampling query drm/amdkfd: enable pc sampling create James Zhu (19): drm/amdkfd: add pc sampling mutex drm/amdkfd: add trace_id return drm/amdkfd: check pcs_enrty valid drm/amdkfd: enable pc sampling destroy drm/amdkfd: add interface to trigger pc sampling trap drm/amdkfd: trigger pc sampling trap for gfx v9 drm/amdkfd/gfx9: enable host trap drm/amdgpu: use trapID 4 for host trap drm/amdgpu: add sq host trap status check drm/amdkfd: trigger pc sampling trap for arcturus drm/amdkfd: trigger pc sampling trap for aldebaran drm/amdkfd: use bit operation set debug trap drm/amdkfd: add setting trap pc sampling flag drm/amdkfd: enable pc sampling stop drm/amdkfd: add queue remapping drm/amdkfd: enable pc sampling start drm/amdkfd: add pc sampling thread to trigger trap drm/amdkfd: add pc sampling release when process release drm/amdkfd: bump kfd ioctl minor version for pc sampling availability .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 14 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 73 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 + drivers/gpu/drm/amd/amdkfd/Makefile |3 +- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 + .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 29 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 14 + .../drm/amd/amdkfd/kfd_device_queue_manager.c | 11 + .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 + drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 372 +++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 35 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 43 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 32 +- .../amd/include/asic_reg/gc/gc_9_0_offset.h |2 + .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h |5 + .../gpu/drm/amd/include/kgd_kfd_interface.h |6 + include/uapi/linux/kfd_ioctl.h| 60 +- 19 files changed, 1813 insertions(+), 1059 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
Re: [PATCH 2/2] drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages
On 2023-12-11 05:38, Christian König wrote: Am 09.12.23 um 00:01 schrieb James Zhu: Needn't do schedule for each hmm_range_fault, and use cond_resched to replace schedule. cond_resched() is usually NAKed upstream since it is a NO-OP in most situations. [JZ] then let me change back to schedule(); Thanks! IIRC there was even a patch set to completely remove it. Christian. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index b24eb5821fd1..c77c4eceea46 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -199,6 +199,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, hmm_range->notifier_seq = mmu_interval_read_begin(notifier); r = hmm_range_fault(hmm_range); if (unlikely(r)) { + cond_resched(); /* * FIXME: This timeout should encompass the retry from * mmu_interval_read_retry() as well. @@ -212,7 +213,6 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, break; hmm_range->hmm_pfns += MAX_WALK_BYTE >> PAGE_SHIFT; hmm_range->start = hmm_range->end; - schedule(); } while (hmm_range->end < end); hmm_range->start = start;
[PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout
When application tries to allocate all system memory and cause memory to swap out. Needs more time for hmm_range_fault to validate the remaining page for allocation. To be safe, increase timeout value to 1 second for 64MB range. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index 081267161d40..b24eb5821fd1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -190,8 +190,8 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, pr_debug("hmm range: start = 0x%lx, end = 0x%lx", hmm_range->start, hmm_range->end); - /* Assuming 128MB takes maximum 1 second to fault page address */ - timeout = max((hmm_range->end - hmm_range->start) >> 27, 1UL); + /* Assuming 64MB takes maximum 1 second to fault page address */ + timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL); timeout *= HMM_RANGE_DEFAULT_TIMEOUT; timeout = jiffies + msecs_to_jiffies(timeout); -- 2.25.1
[PATCH 2/2] drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages
Needn't do schedule for each hmm_range_fault, and use cond_resched to replace schedule. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index b24eb5821fd1..c77c4eceea46 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c @@ -199,6 +199,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, hmm_range->notifier_seq = mmu_interval_read_begin(notifier); r = hmm_range_fault(hmm_range); if (unlikely(r)) { + cond_resched(); /* * FIXME: This timeout should encompass the retry from * mmu_interval_read_retry() as well. @@ -212,7 +213,6 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier, break; hmm_range->hmm_pfns += MAX_WALK_BYTE >> PAGE_SHIFT; hmm_range->start = hmm_range->end; - schedule(); } while (hmm_range->end < end); hmm_range->start = start; -- 2.25.1
[PATCH v2 22/23] drm/amdkfd: add pc sampling release when process release
Add pc sampling release when process release, it will force to stop all activate sessions with this process. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 21 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 1 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 +++ 3 files changed, 25 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 04cc25c79a76..a05dd8b1a7da 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -300,6 +300,27 @@ static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_ return 0; } +void kfd_pc_sample_release(struct kfd_process_device *pdd) +{ + struct pc_sampling_entry *pcs_entry; + struct idr *idp; + uint32_t id; + + /* force to release all PC sampling task for this process */ + idp = >dev->pcs_data.hosttrap_entry.base.pc_sampling_idr; + mutex_lock(>dev->pcs_data.mutex); + idr_for_each_entry(idp, pcs_entry, id) { + if (pcs_entry->pdd != pdd) + continue; + mutex_unlock(>dev->pcs_data.mutex); + if (pcs_entry->enabled) + kfd_pc_sample_stop(pdd, pcs_entry); + kfd_pc_sample_destroy(pdd, id, pcs_entry); + mutex_lock(>dev->pcs_data.mutex); + } + mutex_unlock(>dev->pcs_data.mutex); +} + int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args) { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h index 4eeded4ea5b6..6175563ca9be 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h @@ -30,5 +30,6 @@ int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args); +void kfd_pc_sample_release(struct kfd_process_device *pdd); #endif /* KFD_PC_SAMPLING_H_ */ diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 6bc9dcfad484..1f8d6098dfb2 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -43,6 +43,7 @@ struct mm_struct; #include "kfd_svm.h" #include "kfd_smi_events.h" #include "kfd_debug.h" +#include "kfd_pc_sampling.h" /* * List of struct kfd_process (field kfd_process). @@ -1021,6 +1022,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p) pr_debug("Releasing pdd (topology id %d) for process (pasid 0x%x)\n", pdd->dev->id, p->pasid); + kfd_pc_sample_release(pdd); + kfd_process_device_destroy_cwsr_dgpu(pdd); kfd_process_device_destroy_ib_mem(pdd); -- 2.25.1
[PATCH v2 23/23] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability
Bump the minor version to declare pc sampling feature is now available. Signed-off-by: James Zhu --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 1bd1347effea..62d8642d3d1c 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -40,9 +40,10 @@ * - 1.12 - Add DMA buf export ioctl * - 1.13 - Add debugger API * - 1.14 - Update kfd_event_data + * - 1.15 - Add PC Sampling ioctl */ #define KFD_IOCTL_MAJOR_VERSION 1 -#define KFD_IOCTL_MINOR_VERSION 14 +#define KFD_IOCTL_MINOR_VERSION 15 struct kfd_ioctl_get_version_args { __u32 major_version;/* from KFD */ -- 2.25.1
[PATCH v2 21/23] drm/amdkfd: add pc sampling thread to trigger trap
Add a kthread to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 68 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 1 + 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 49b5d4c9f7e0..04cc25c79a76 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -39,6 +39,66 @@ struct supported_pc_sample_info supported_formats[] = { { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 }, }; +static int kfd_pc_sample_thread(void *param) +{ + struct amdgpu_device *adev; + struct kfd_node *node = param; + uint32_t timeout = 0; + + mutex_lock(>pcs_data.mutex); + if (node->pcs_data.hosttrap_entry.base.active_count && + node->pcs_data.hosttrap_entry.base.pc_sample_info.interval && + node->kfd2kgd->trigger_pc_sample_trap) { + switch (node->pcs_data.hosttrap_entry.base.pc_sample_info.type) { + case KFD_IOCTL_PCS_TYPE_TIME_US: + timeout = (uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.interval; + break; + default: + pr_debug("PC Sampling type %d not supported.", + node->pcs_data.hosttrap_entry.base.pc_sample_info.type); + } + } + mutex_unlock(>pcs_data.mutex); + if (!timeout) + return -EINVAL; + + adev = node->adev; + + allow_signal(SIGKILL); + while (!kthread_should_stop() || + !READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable)) { + node->kfd2kgd->trigger_pc_sample_trap(adev, node->vm_info.last_vmid_kfd, + >pcs_data.hosttrap_entry.base.target_simd, + >pcs_data.hosttrap_entry.base.target_wave_slot, + node->pcs_data.hosttrap_entry.base.pc_sample_info.method); + pr_debug_ratelimited("triggered a host trap."); + + if (signal_pending(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) + break; + usleep_range(timeout, timeout + 10); + } + node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL; + + return 0; +} + +static int kfd_pc_sample_thread_start(struct kfd_node *node) +{ + char thread_name[32]; + int ret = 0; + + snprintf(thread_name, 32, "pc_sampling_%08x", node->id); + node->pcs_data.hosttrap_entry.base.pc_sample_thread = + kthread_run(kfd_pc_sample_thread, node, thread_name); + if (IS_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) { + ret = PTR_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread); + node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL; + pr_debug("Failed to create pc sample thread for %s.\n", thread_name); + } + + return ret; +} + static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { @@ -88,6 +148,7 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd, struct pc_sampling_entry *pcs_entry) { bool pc_sampling_start = false; + int ret = 0; pcs_entry->enabled = true; mutex_lock(>dev->pcs_data.mutex); @@ -102,11 +163,13 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd, } else { kfd_process_set_trap_pc_sampling_flag(>qpd, pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true); + if (!pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread) + ret = kfd_pc_sample_thread_start(pdd->dev); break; } } - return 0; + return ret; } static int kfd_pc_sample_stop(struct kfd_process_device *pdd, @@ -124,6 +187,9 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd, mutex_unlock(>dev->pcs_data.mutex); if (pc_sampling_stop) { + kthread_stop(pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread); + while (pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread) + usleep_range(1000, 2000); kfd_process_set_trap_pc_sampling_flag(>qpd, pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false); remap_queue(pdd->dev->dqm, d
[PATCH v2 12/23] drm/amdgpu: use trapID 4 for host trap
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify the host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 + .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 + .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |5 + 3 files changed, 1070 insertions(+), 1054 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index 7d8c0e13ac12..adfe5e5585e5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1162,6 +1162,8 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd); /* select *target_wave_slot */ value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, (*target_wave_slot)++); + /* set TrapID 4 for HOSTTRAP */ + value = REG_SET_FIELD(value, SQ_CMD, DATA, 0x4); mutex_lock(>grbm_idx_mutex); amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index 747426bd5181..44955838f307 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -274,155 +274,263 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { static const uint32_t cwsr_trap_gfx9_hex[] = { - 0xbf820001, 0xbf82025e, + 0xbf820001, 0xbf820263, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, - 0x00ff, 0xbf85001e, + 0x00ff, 0xbf850023, 0x866eff7b, 0x0400, - 0xbf85005b, 0xbf8e0010, + 0xbf850060, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, - 0xbf850015, 0x866eff7b, - 0x71ff, 0xbf840008, - 0x866fff7b, 0x7080, - 0xbf840001, 0xbeee1a87, - 0xb8eff801, 0x8e6e8c6e, - 0x866e6f6e, 0xbf85000a, - 0x866eff6d, 0x00ff, - 0xbf850007, 0xb8eef801, - 0x866eff6e, 0x0800, - 0xbf850003, 0x866eff7b, - 0x0400, 0xbf850040, - 0xb8faf807, 0x867aff7a, - 0x001f8000, 0x8e7a8b7a, - 0x8977ff77, 0xfc00, - 0x8a77, 0xba7ff807, - 0x, 0xb8faf812, - 0xb8fbf813, 0x8efa887a, - 0xbf0d8f7b, 0xbf840002, - 0x877bff7b, 0x, - 0xc0031c3d, 0x0010, - 0xc0071bbd, 0x, - 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x8671ff6d, - 0x0100, 0xbf840004, - 0x92f1ff70, 0x00010001, - 0xbf840016, 0xbf820005, - 0x86708170, 0x8e709770, - 0x8977ff77, 0x0080, - 0x8077, 0x86ee6e6e, - 0xbf840001, 0xbe801d6e, - 0x866eff6d, 0x01ff, - 0xbf850005, 0x8778ff78, - 0x2000, 0x80ec886c, - 0x82ed806d, 0xbf820005, - 0x866eff6d, 0x0100, - 0xbf850002, 0x806c846c, - 0x826d806d, 0x866dff6d, - 0x, 0x8f7a8b77, + 0xbf85001a, 0x866eff6d, + 0x01ff, 0xbf06ff6e, + 0x0104, 0xbf850015, + 0x866eff7b, 0x71ff, + 0xbf840008, 0x866fff7b, + 0x7080, 0xbf840001, + 0xbeee1a87, 0xb8eff801, + 0x8e6e8c6e, 0x866e6f6e, + 0xbf85000a, 0x866eff6d, + 0x00ff, 0xbf850007, + 0xb8eef801, 0x866eff6e, + 0x0800, 0xbf850003, + 0x866eff7b, 0x0400, + 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, - 0xb97af807, 0x86fe7e7e, - 0x86ea6a6a, 0x8f6e8378, - 0xb96ee0c2, 0xbf82, - 0xb9780002, 0xbe801f6c, + 0x8e7a8b7a, 0x8977ff77, + 0xfc00, 0x8a77, + 0xba7ff807, 0x, + 0xb8faf812, 0xb8fbf813, + 0x8efa887a, 0xbf0d8f7b, + 0xbf840002, 0x877bff7b, + 0x, 0xc0031c3d, + 0x0010, 0xc0071bbd, + 0x, 0xc0071ebd, + 0x0008, 0xbf8cc07f, + 0x8671ff6d, 0x0100, + 0xbf840004, 0x92f1ff70, + 0x00010001, 0xbf840016, + 0xbf820005, 0x86708170, + 0x8e709770, 0x8977ff77, + 0x0080, 0x8077, + 0x86ee6e6e, 0xbf840001, + 0xbe801d6e, 0x866eff6d, + 0x01ff, 0xbf850005, + 0x8778ff78, 0x2000, + 0x80ec886c, 0x82ed806d, + 0xbf820005, 0x866eff6d, + 0x0100, 0xbf850002, + 0x806c846c, 0x826d806d, 0x866dff6d, 0x, - 0xbefa0080, 0xb97a0283, - 0xb8faf807, 0x867aff7a, - 0x001f8000, 0x8e7a8b7a, - 0x8977ff77, 0xfc00, - 0x8a77, 0xba7ff807, - 0x, 0xbeee007e, - 0xbeef007f, 0xbefe0180, - 0xbf94, 0x877a8478, - 0xb97af802, 0xbf8e0002, - 0xbf88fffe, 0xb8fa2a05, - 0x807a817a, 0x8e7a8
[PATCH v2 19/23] drm/amdkfd: add queue remapping
Add queue remapping to ensure that any waves executing the PC sampling part of the trap handler are done before kfd_pc_sample_stop returns, and that no new waves enter that part of the trap handler afterwards. This avoids race conditions that could lead to use-after-free. Unmapping and remapping the queues either waits for the waves to drain, or preempts them with CWSR, which itself executes a trap and waits for previous traps to finish. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++ drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 5 + drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 3 +++ 3 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index c0e71543389a..a3f57be63f4f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager *dqm) return debug_map_and_unlock(dqm); } +void remap_queue(struct device_queue_manager *dqm, + enum kfd_unmap_queues_filter filter, + uint32_t filter_param, + uint32_t grace_period) +{ + dqm_lock(dqm); + if (!dqm->dev->kfd->shared_resources.enable_mes) + execute_queues_cpsch(dqm, filter, filter_param, grace_period); + dqm_unlock(dqm); +} + #if defined(CONFIG_DEBUG_FS) static void seq_reg_dump(struct seq_file *m, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index cf7e182588f8..f8aae3747a36 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm); int debug_map_and_unlock(struct device_queue_manager *dqm); int debug_refresh_runlist(struct device_queue_manager *dqm); +void remap_queue(struct device_queue_manager *dqm, + enum kfd_unmap_queues_filter filter, + uint32_t filter_param, + uint32_t grace_period); + static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd) { return (pdd->lds_base >> 16) & 0xFF; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 29a6f9f40f83..7d0722498bf5 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -24,6 +24,7 @@ #include "kfd_priv.h" #include "amdgpu_amdkfd.h" #include "kfd_pc_sampling.h" +#include "kfd_device_queue_manager.h" struct supported_pc_sample_info { uint32_t ip_version; @@ -105,6 +106,8 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd, if (pc_sampling_stop) { kfd_process_set_trap_pc_sampling_flag(>qpd, pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false); + remap_queue(pdd->dev->dqm, + KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD); mutex_lock(>dev->pcs_data.mutex); pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0; -- 2.25.1
[PATCH v2 15/23] drm/amdkfd: trigger pc sampling trap for aldebaran
Implement trigger pc sampling trap for aldebaran. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index aff08321e976..27eda75ceecb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,6 +163,16 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } +static uint32_t kgd_aldebaran_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 8, 4, + target_simd, target_wave_slot, method); +} + const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -191,4 +201,5 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings, + .trigger_pc_sample_trap = kgd_aldebaran_trigger_pc_sample_trap, }; -- 2.25.1
[PATCH v2 16/23] drm/amdkfd: use bit operation set debug trap
1st level TMA's 2nd byte which used for trap type setting, to use bit operation to change selected bit only. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 71df51fcc1b0..1a31b556a5ff 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1440,13 +1440,23 @@ bool kfd_process_xnack_mode(struct kfd_process *p, bool supported) return true; } +/* bit offset in 1st-level TMA's 2nd byte which used for KFD_TRAP_TYPE_BIT */ +enum KFD_TRAP_TYPE_BIT { + KFD_TRAP_TYPE_DEBUG = 0,/* bit 0 for debug trap */ + KFD_TRAP_TYPE_HOST, + KFD_TRAP_TYPE_STOCHASTIC, +}; + void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, bool enabled) { if (qpd->cwsr_kaddr) { - uint64_t *tma = - (uint64_t *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); - tma[2] = enabled; + volatile unsigned long *tma = + (volatile unsigned long *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); + if (enabled) + set_bit(KFD_TRAP_TYPE_DEBUG, [2]); + else + clear_bit(KFD_TRAP_TYPE_DEBUG, [2]); } } -- 2.25.1
[PATCH v2 18/23] drm/amdkfd: enable pc sampling stop
Enable pc sampling stop. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 28 +--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 4 +++ 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 18fe06d712c5..29a6f9f40f83 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -88,10 +88,32 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd) return -EINVAL; } -static int kfd_pc_sample_stop(struct kfd_process_device *pdd) +static int kfd_pc_sample_stop(struct kfd_process_device *pdd, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + bool pc_sampling_stop = false; + + pcs_entry->enabled = false; + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.active_count--; + if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) { + WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, true); + pc_sampling_stop = true; + } + mutex_unlock(>dev->pcs_data.mutex); + if (pc_sampling_stop) { + kfd_process_set_trap_pc_sampling_flag(>qpd, + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false); + + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0; + pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0; + WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, false); + mutex_unlock(>dev->pcs_data.mutex); + } + + return 0; } static int kfd_pc_sample_create(struct kfd_process_device *pdd, @@ -233,7 +255,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (!pcs_entry->enabled) return -EALREADY; else - return kfd_pc_sample_stop(pdd); + return kfd_pc_sample_stop(pdd, pcs_entry); } return -EINVAL; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index b9a36891d099..0839a0ca3099 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -271,6 +271,10 @@ struct kfd_dev; struct kfd_dev_pc_sampling_data { uint32_t use_count; /* Num of PC sampling sessions */ + uint32_t active_count; /* Num of active sessions */ + uint32_t target_simd; /* target simd for trap */ + uint32_t target_wave_slot; /* target wave slot for trap */ + bool stop_enable; /* pc sampling stop in process */ struct idr pc_sampling_idr; struct kfd_pc_sample_info pc_sample_info; }; -- 2.25.1
[PATCH v2 17/23] drm/amdkfd: add setting trap pc sampling flag
Add setting trap pc sampling flag. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 + 2 files changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 7ca7cc726246..b9a36891d099 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1198,6 +1198,8 @@ void kfd_process_set_trap_handler(struct qcm_process_device *qpd, uint64_t tma_addr); void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, bool enabled); +void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd, +enum kfd_ioctl_pc_sample_method method, bool enabled); /* CWSR initialization */ int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file *filep); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 1a31b556a5ff..6bc9dcfad484 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -1460,6 +1460,19 @@ void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd, } } +void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd, +enum kfd_ioctl_pc_sample_method method, bool enabled) +{ + if (qpd->cwsr_kaddr) { + volatile unsigned long *tma = + (volatile unsigned long *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET); + if (enabled) + set_bit(method, [2]); + else + clear_bit(method, [2]); + } +} + /* * On return the kfd_process is fully operational and will be freed when the * mm is released -- 2.25.1
[PATCH v2 20/23] drm/amdkfd: enable pc sampling start
Enable pc sampling start. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 7d0722498bf5..49b5d4c9f7e0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -84,9 +84,29 @@ static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, return 0; } -static int kfd_pc_sample_start(struct kfd_process_device *pdd) +static int kfd_pc_sample_start(struct kfd_process_device *pdd, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + bool pc_sampling_start = false; + + pcs_entry->enabled = true; + mutex_lock(>dev->pcs_data.mutex); + if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) + pc_sampling_start = true; + pdd->dev->pcs_data.hosttrap_entry.base.active_count++; + mutex_unlock(>dev->pcs_data.mutex); + + while (pc_sampling_start) { + if (READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) { + usleep_range(1000, 2000); + } else { + kfd_process_set_trap_pc_sampling_flag(>qpd, + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true); + break; + } + } + + return 0; } static int kfd_pc_sample_stop(struct kfd_process_device *pdd, @@ -252,7 +272,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (pcs_entry->enabled) return -EALREADY; else - return kfd_pc_sample_start(pdd); + return kfd_pc_sample_start(pdd, pcs_entry); case KFD_IOCTL_PCS_OP_STOP: if (!pcs_entry->enabled) -- 2.25.1
[PATCH v2 11/23] drm/amdkfd/gfx9: enable host trap
Enable host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++ .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 --- 2 files changed, 52 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index df75863393fc..747426bd5181 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -274,14 +274,14 @@ static const uint32_t cwsr_trap_gfx8_hex[] = { static const uint32_t cwsr_trap_gfx9_hex[] = { - 0xbf820001, 0xbf820258, + 0xbf820001, 0xbf82025e, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -294,7 +294,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -303,13 +303,16 @@ static const uint32_t cwsr_trap_gfx9_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f, + 0xc0031c3d, 0x0010, + 0xc0071bbd, 0x, 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x86ee6e6e, + 0xbf8cc07f, 0x8671ff6d, + 0x0100, 0xbf840004, + 0x92f1ff70, 0x00010001, + 0xbf840016, 0xbf820005, + 0x86708170, 0x8e709770, + 0x8977ff77, 0x0080, + 0x8077, 0x86ee6e6e, 0xbf840001, 0xbe801d6e, 0x866eff6d, 0x01ff, 0xbf850005, 0x8778ff78, @@ -1098,14 +1101,14 @@ static const uint32_t cwsr_trap_nv1x_hex[] = { }; static const uint32_t cwsr_trap_arcturus_hex[] = { - 0xbf820001, 0xbf8202d4, + 0xbf820001, 0xbf8202da, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -1118,7 +1121,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -1127,13 +1130,16 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f, + 0xc0031c3d, 0x0010, + 0xc0071bbd, 0x, 0xc0071ebd, 0x0008, - 0xbf8cc07f, 0x86ee6e6e, + 0xbf8cc07f, 0x8671ff6d, + 0x0100, 0xbf840004, + 0x92f1ff70, 0x00010001, + 0xbf840016, 0xbf820005, + 0x86708170, 0x8e709770, + 0x8977ff77, 0x0080, + 0x8077, 0x86ee6e6e, 0xbf840001, 0xbe801d6e, 0x866eff6d, 0x01ff, 0xbf850005, 0x8778ff78, @@ -1578,14 +1584,14 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { }; static const uint32_t cwsr_trap_aldebaran_hex[] = { - 0xbf820001, 0xbf8202df, + 0xbf820001, 0xbf8202e5, 0xb8f8f802, 0x8978ff78, 0x00020006, 0xb8fbf803, 0x866eff78, 0x2000, 0xbf840009, 0x866eff6d, 0x00ff, 0xbf85001e, 0x866eff7b, 0x0400, - 0xbf850055, 0xbf8e0010, + 0xbf85005b, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, @@ -1598,7 +1604,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = { 0xbf850007, 0xb8eef801, 0x866eff6e, 0x0800, 0xbf850003, 0x866eff7b, - 0x0400, 0xbf85003a, + 0x0400, 0xbf850040, 0xb8faf807, 0x867aff7a, 0x001f8000, 0x8e7a8b7a, 0x8977ff77, 0xfc00, @@ -1607,13 +1613,16 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = { 0xb8fbf813, 0x8efa887a, 0xbf0d8f7b, 0xbf840002, 0x877bff7b, 0x, - 0xc0031bbd, 0x0010, - 0xbf8cc07f, 0x8e6e976e, - 0x8977ff77, 0x0080, - 0x87776e77, 0xc0071bbd, - 0x, 0xbf8cc07f
[PATCH v2 13/23] drm/amdgpu: add sq host trap status check
Before fire a new host trap, check the host trap status. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 35 +++ .../amd/include/asic_reg/gc/gc_9_0_offset.h | 2 ++ .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h | 5 +++ 3 files changed, 42 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index adfe5e5585e5..43edd62df5fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1144,6 +1144,35 @@ void kgd_gfx_v9_program_trap_handler_settings(struct amdgpu_device *adev, kgd_gfx_v9_unlock_srbm(adev, inst); } +static uint32_t kgd_aldebaran_get_hosttrap_status(struct amdgpu_device *adev) +{ + uint32_t sq_hosttrap_status = 0x0; + int i, j; + + mutex_lock(>grbm_idx_mutex); + for (i = 0; i < adev->gfx.config.max_shader_engines; i++) { + for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) { + amdgpu_gfx_select_se_sh(adev, i, j, 0x, 0); + sq_hosttrap_status = RREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS); + + if (sq_hosttrap_status & SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK) { + WREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS, + SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK); + sq_hosttrap_status = 0x0; + continue; + } + if (sq_hosttrap_status) + goto out; + } + } + +out: + amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); + mutex_unlock(>grbm_idx_mutex); + + return sq_hosttrap_status; +} + uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, uint32_t vmid, uint32_t max_wave_slot, @@ -1154,6 +1183,12 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, { if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) { uint32_t value = 0; + uint32_t sq_hosttrap_status = 0x0; + + sq_hosttrap_status = kgd_aldebaran_get_hosttrap_status(adev); + /* skip when last host trap request is still pending to complete */ + if (sq_hosttrap_status) + return 0; value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP); value = REG_SET_FIELD(value, SQ_CMD, MODE, SQ_IND_CMD_MODE_SINGLE); diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h index 12d451e5475b..5b17d9066452 100644 --- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h +++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h @@ -462,6 +462,8 @@ #define mmSQ_IND_DATA_BASE_IDX 0 #define mmSQ_CMD 0x037b #define mmSQ_CMD_BASE_IDX 0 +#define mmSQ_HOSTTRAP_STATUS 0x0376 +#define mmSQ_HOSTTRAP_STATUS_BASE_IDX 0 #define mmSQ_TIME_HI 0x037c #define mmSQ_TIME_HI_BASE_IDX 0 #define mmSQ_TIME_LO 0x037d diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h index efc16ddf274a..3dfe4ab31421 100644 --- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h +++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h @@ -2616,6 +2616,11 @@ //SQ_CMD_TIMESTAMP #define SQ_CMD_TIMESTAMP__TIMESTAMP__SHIFT 0x0 #define SQ_CMD_TIMESTAMP__TIMESTAMP_MASK 0x00FFL +//SQ_HOSTTRAP_STATUS +#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT__SHIFT 0x0 +#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE__SHIFT 0x8 +#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT_MASK 0x00FFL +#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK
[PATCH v2 09/23] drm/amdkfd: add interface to trigger pc sampling trap
Add interface to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h index 6d094cf3587d..05b0255aca37 100644 --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h @@ -33,6 +33,7 @@ #include #include "amdgpu_irq.h" #include "amdgpu_gfx.h" +#include struct pci_dev; struct amdgpu_device; @@ -318,6 +319,11 @@ struct kfd2kgd_calls { void (*program_trap_handler_settings)(struct amdgpu_device *adev, uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr, uint32_t inst); + uint32_t (*trigger_pc_sample_trap)(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method); }; #endif /* KGD_KFD_INTERFACE_H_INCLUDED */ -- 2.25.1
[PATCH v2 14/23] drm/amdkfd: trigger pc sampling trap for arcturus
Implement trigger pc sampling trap for arcturus. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c| 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c index 0ba15dcbe4e1..10b362e072a6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c @@ -390,6 +390,17 @@ static uint32_t kgd_arcturus_disable_debug_trap(struct amdgpu_device *adev, return 0; } + +static uint32_t kgd_arcturus_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 10, 4, + target_simd, target_wave_slot, method); +} + const struct kfd2kgd_calls arcturus_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -418,5 +429,6 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = { .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy, - .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings + .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings, + .trigger_pc_sample_trap = kgd_arcturus_trigger_pc_sample_trap }; -- 2.25.1
[PATCH v2 08/23] drm/amdkfd: enable pc sampling destroy
Enable pc sampling destroy. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index e5aa87b2da4f..18fe06d712c5 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -169,10 +169,24 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, return 0; } -static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id) +static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + pr_debug("free pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", + pcs_entry, trace_id, pdd->dev->id); + + mutex_lock(>dev->pcs_data.mutex); + pdd->dev->pcs_data.hosttrap_entry.base.use_count--; + idr_remove(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, trace_id); + if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) + memset(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, 0x0, + sizeof(struct kfd_pc_sample_info)); + mutex_unlock(>dev->pcs_data.mutex); + + kvfree(pcs_entry); + + return 0; } int kfd_pc_sample(struct kfd_process_device *pdd, @@ -207,7 +221,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (pcs_entry->enabled) return -EBUSY; else - return kfd_pc_sample_destroy(pdd, args->trace_id); + return kfd_pc_sample_destroy(pdd, args->trace_id, pcs_entry); case KFD_IOCTL_PCS_OP_START: if (pcs_entry->enabled) -- 2.25.1
[PATCH v2 10/23] drm/amdkfd: trigger pc sampling trap for gfx v9
Implement trigger pc sampling trap for gfx v9. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 36 +++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 7 2 files changed, 43 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c index 5a35a8ca8922..7d8c0e13ac12 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c @@ -1144,6 +1144,42 @@ void kgd_gfx_v9_program_trap_handler_settings(struct amdgpu_device *adev, kgd_gfx_v9_unlock_srbm(adev, inst); } +uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t max_wave_slot, + uint32_t max_simd, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method) +{ + if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) { + uint32_t value = 0; + + value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP); + value = REG_SET_FIELD(value, SQ_CMD, MODE, SQ_IND_CMD_MODE_SINGLE); + + /* select *target_simd */ + value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd); + /* select *target_wave_slot */ + value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, (*target_wave_slot)++); + + mutex_lock(>grbm_idx_mutex); + amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0); + WREG32_SOC15(GC, 0, mmSQ_CMD, value); + mutex_unlock(>grbm_idx_mutex); + + *target_wave_slot %= max_wave_slot; + if (!(*target_wave_slot)) { + (*target_simd)++; + *target_simd %= max_simd; + } + } else { + pr_debug("PC Sampling method %d not supported.", method); + return -EOPNOTSUPP; + } + return 0; +} + const struct kfd2kgd_calls gfx_v9_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h index ce424615f59b..b47b926891a8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h @@ -101,3 +101,10 @@ void kgd_gfx_v9_build_grace_period_packet_info(struct amdgpu_device *adev, uint32_t grace_period, uint32_t *reg_offset, uint32_t *reg_data); +uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev, + uint32_t vmid, + uint32_t max_wave_slot, + uint32_t max_simd, + uint32_t *target_simd, + uint32_t *target_wave_slot, + enum kfd_ioctl_pc_sample_method method); -- 2.25.1
[PATCH v2 06/23] drm/amdkfd: add trace_id return
Add trace_id return for new pc sampling creation per device, Use IDR to quickly locate pc_sampling_entry for reference. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 6 ++ 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0e24e011f66b..bcaeedac8fe0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev) static void kfd_pc_sampling_init(struct kfd_node *dev) { mutex_init(>pcs_data.mutex); + idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1); } static void kfd_pc_sampling_exit(struct kfd_node *dev) { + idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr); mutex_destroy(>pcs_data.mutex); } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 7828a6340edf..b44dfea15539 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -99,6 +99,7 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, { struct kfd_pc_sample_info *supported_format = NULL; struct kfd_pc_sample_info user_info; + struct pc_sampling_entry *pcs_entry; int ret; int i; @@ -140,7 +141,19 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, return ret ? -EFAULT : -EEXIST; } - /* TODO: add trace_id return */ + pcs_entry = kzalloc(sizeof(*pcs_entry), GFP_KERNEL); + if (!pcs_entry) { + mutex_unlock(>dev->pcs_data.mutex); + return -ENOMEM; + } + + i = idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + pcs_entry, 1, 0, GFP_KERNEL); + if (i < 0) { + mutex_unlock(>dev->pcs_data.mutex); + kfree(pcs_entry); + return i; + } if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = user_info; @@ -148,6 +161,11 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, pdd->dev->pcs_data.hosttrap_entry.base.use_count++; mutex_unlock(>dev->pcs_data.mutex); + pcs_entry->pdd = pdd; + user_args->trace_id = (uint32_t)i; + + pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", pcs_entry, i, pdd->dev->id); + return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index db2d09db8000..7ca7cc726246 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -271,6 +271,7 @@ struct kfd_dev; struct kfd_dev_pc_sampling_data { uint32_t use_count; /* Num of PC sampling sessions */ + struct idr pc_sampling_idr; struct kfd_pc_sample_info pc_sample_info; }; @@ -756,6 +757,11 @@ enum kfd_pdd_bound { */ #define SDMA_ACTIVITY_DIVISOR 100 +struct pc_sampling_entry { + bool enabled; + struct kfd_process_device *pdd; +}; + /* Data that is per-process-per device. */ struct kfd_process_device { /* The device that owns this data. */ -- 2.25.1
[PATCH v2 03/23] drm/amdkfd: enable pc sampling query
From: David Yat Sin Enable pc sampling to query system capability. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index a7e78ff42d07..49fecbc7013e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -25,10 +25,62 @@ #include "amdgpu_amdkfd.h" #include "kfd_pc_sampling.h" +struct supported_pc_sample_info { + uint32_t ip_version; + const struct kfd_pc_sample_info *sample_info; +}; + +const struct kfd_pc_sample_info sample_info_hosttrap_9_0_0 = { + 0, 1, ~0ULL, 0, KFD_IOCTL_PCS_METHOD_HOSTTRAP, KFD_IOCTL_PCS_TYPE_TIME_US }; + +struct supported_pc_sample_info supported_formats[] = { + { IP_VERSION(9, 4, 1), _info_hosttrap_9_0_0 }, + { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 }, +}; + static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { - return -EINVAL; + uint64_t sample_offset; + int num_method = 0; + int i; + + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version) + num_method++; + + if (!num_method) { + pr_debug("PC Sampling not supported on GC_HWIP:0x%x.", + pdd->dev->adev->ip_versions[GC_HWIP][0]); + return -EOPNOTSUPP; + } + + if (!user_args->sample_info_ptr) { + user_args->num_sample_info = num_method; + return 0; + } + + if (user_args->num_sample_info < num_method) { + user_args->num_sample_info = num_method; + pr_debug("Sample info buffer is not large enough, " +"ASIC requires space for %d kfd_pc_sample_info entries.", num_method); + return -ENOSPC; + } + + sample_offset = user_args->sample_info_ptr; + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) { + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version) { + int ret = copy_to_user((void __user *) sample_offset, + supported_formats[i].sample_info, sizeof(struct kfd_pc_sample_info)); + if (ret) { + pr_debug("Failed to copy PC sampling info to user."); + return -EFAULT; + } + sample_offset += sizeof(struct kfd_pc_sample_info); + } + } + + return 0; } static int kfd_pc_sample_start(struct kfd_process_device *pdd) -- 2.25.1
[PATCH v2 05/23] drm/amdkfd: enable pc sampling create
From: David Yat Sin Enable pc sampling create. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 53 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 49fecbc7013e..7828a6340edf 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -97,7 +97,58 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd) static int kfd_pc_sample_create(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { - return -EINVAL; + struct kfd_pc_sample_info *supported_format = NULL; + struct kfd_pc_sample_info user_info; + int ret; + int i; + + if (user_args->num_sample_info != 1) + return -EINVAL; + + ret = copy_from_user(_info, (void __user *) user_args->sample_info_ptr, + sizeof(struct kfd_pc_sample_info)); + if (ret) { + pr_debug("Failed to copy PC sampling info from user\n"); + return -EFAULT; + } + + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) { + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version + && user_info.method == supported_formats[i].sample_info->method + && user_info.type == supported_formats[i].sample_info->type + && user_info.interval <= supported_formats[i].sample_info->interval_max + && user_info.interval >= supported_formats[i].sample_info->interval_min) { + supported_format = + (struct kfd_pc_sample_info *)supported_formats[i].sample_info; + break; + } + } + + if (!supported_format) { + pr_debug("Sampling format is not supported!"); + return -EOPNOTSUPP; + } + + mutex_lock(>dev->pcs_data.mutex); + if (pdd->dev->pcs_data.hosttrap_entry.base.use_count && + memcmp(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, + _info, sizeof(user_info))) { + ret = copy_to_user((void __user *) user_args->sample_info_ptr, + >dev->pcs_data.hosttrap_entry.base.pc_sample_info, + sizeof(struct kfd_pc_sample_info)); + mutex_unlock(>dev->pcs_data.mutex); + return ret ? -EFAULT : -EEXIST; + } + + /* TODO: add trace_id return */ + + if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = user_info; + + pdd->dev->pcs_data.hosttrap_entry.base.use_count++; + mutex_unlock(>dev->pcs_data.mutex); + + return 0; } static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index cbaa1bccd94b..db2d09db8000 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -269,9 +269,19 @@ struct kfd_vmid_info { struct kfd_dev; +struct kfd_dev_pc_sampling_data { + uint32_t use_count; /* Num of PC sampling sessions */ + struct kfd_pc_sample_info pc_sample_info; +}; + +struct kfd_dev_pcs_hosttrap { + struct kfd_dev_pc_sampling_data base; +}; + /* Per device PC Sampling data */ struct kfd_dev_pc_sampling { struct mutex mutex; + struct kfd_dev_pcs_hosttrap hosttrap_entry; }; struct kfd_node { -- 2.25.1
[PATCH v2 07/23] drm/amdkfd: check pcs_enrty valid
Check pcs_entry valid for pc sampling ioctl. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++-- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index b44dfea15539..e5aa87b2da4f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -178,6 +178,24 @@ static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_ int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args) { + struct pc_sampling_entry *pcs_entry; + + if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES && + args->op != KFD_IOCTL_PCS_OP_CREATE) { + + mutex_lock(>dev->pcs_data.mutex); + pcs_entry = idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + args->trace_id); + mutex_unlock(>dev->pcs_data.mutex); + + /* pcs_entry is only for this pc sampling process, +* which has kfd_process->mutex protected here. +*/ + if (!pcs_entry || + pcs_entry->pdd != pdd) + return -EINVAL; + } + switch (args->op) { case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: return kfd_pc_sample_query_cap(pdd, args); @@ -186,13 +204,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd, return kfd_pc_sample_create(pdd, args); case KFD_IOCTL_PCS_OP_DESTROY: - return kfd_pc_sample_destroy(pdd, args->trace_id); + if (pcs_entry->enabled) + return -EBUSY; + else + return kfd_pc_sample_destroy(pdd, args->trace_id); case KFD_IOCTL_PCS_OP_START: - return kfd_pc_sample_start(pdd); + if (pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_start(pdd); case KFD_IOCTL_PCS_OP_STOP: - return kfd_pc_sample_stop(pdd); + if (!pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_stop(pdd); } return -EINVAL; -- 2.25.1
[PATCH v2 04/23] drm/amdkfd: add pc sampling mutex
Add pc sampling mutex per node, and do init/destroy in node init. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +++ 2 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0a9cf9dfc224..0e24e011f66b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -533,6 +533,16 @@ static void kfd_smi_init(struct kfd_node *dev) spin_lock_init(>smi_lock); } +static void kfd_pc_sampling_init(struct kfd_node *dev) +{ + mutex_init(>pcs_data.mutex); +} + +static void kfd_pc_sampling_exit(struct kfd_node *dev) +{ + mutex_destroy(>pcs_data.mutex); +} + static int kfd_init_node(struct kfd_node *node) { int err = -1; @@ -563,6 +573,7 @@ static int kfd_init_node(struct kfd_node *node) } kfd_smi_init(node); + kfd_pc_sampling_init(node); return 0; @@ -593,6 +604,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned int num_nodes) kfd_topology_remove_device(knode); if (knode->gws) amdgpu_amdkfd_free_gws(knode->adev, knode->gws); + kfd_pc_sampling_exit(knode); kfree(knode); kfd->nodes[i] = NULL; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 99426182bfc6..cbaa1bccd94b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -269,6 +269,11 @@ struct kfd_vmid_info { struct kfd_dev; +/* Per device PC Sampling data */ +struct kfd_dev_pc_sampling { + struct mutex mutex; +}; + struct kfd_node { unsigned int node_id; struct amdgpu_device *adev; /* Duplicated here along with keeping @@ -322,6 +327,8 @@ struct kfd_node { struct kfd_local_mem_info local_mem_info; struct kfd_dev *kfd; + + struct kfd_dev_pc_sampling pcs_data; }; struct kfd_dev { -- 2.25.1
[PATCH v2 01/23] drm/amdkfd/kfd_ioctl: add pc sampling support
From: David Yat Sin Add pc sampling support in kfd_ioctl. The user mode code which uses this new kfd_ioctl is linked to https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface with master branch. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- include/uapi/linux/kfd_ioctl.h | 57 +- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index f0ed68974c54..1bd1347effea 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1446,6 +1446,58 @@ struct kfd_ioctl_dbg_trap_args { }; }; +/** + * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations + * + * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities + * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a per-device PC sampler instance + * @KFD_IOCTL_PCS_OP_DESTROY:Unregister from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_START: Process begins taking samples from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_STOP: Process stops taking samples from a previously registered PC sampler instance + */ +enum kfd_ioctl_pc_sample_op { + KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES, + KFD_IOCTL_PCS_OP_CREATE, + KFD_IOCTL_PCS_OP_DESTROY, + KFD_IOCTL_PCS_OP_START, + KFD_IOCTL_PCS_OP_STOP, +}; + +/* Values have to be a power of 2*/ +#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001 + +enum kfd_ioctl_pc_sample_method { + KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1, + KFD_IOCTL_PCS_METHOD_STOCHASTIC, +}; + +enum kfd_ioctl_pc_sample_type { + KFD_IOCTL_PCS_TYPE_TIME_US, + KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES, + KFD_IOCTL_PCS_TYPE_INSTRUCTIONS +}; + +struct kfd_pc_sample_info { + __u64 interval; /* [IN] if PCS_TYPE_INTERVAL_US: sample interval in us + * if PCS_TYPE_CLOCK_CYCLES: sample interval in graphics core clk cycles + * if PCS_TYPE_INSTRUCTIONS: sample interval in instructions issued by + * graphics compute units + */ + __u64 interval_min; /* [OUT] */ + __u64 interval_max; /* [OUT] */ + __u64 flags; /* [OUT] indicate potential restrictions e.g FLAG_POWER_OF_2 */ + __u32 method;/* [IN/OUT] kfd_ioctl_pc_sample_method */ + __u32 type; /* [IN/OUT] kfd_ioctl_pc_sample_type */ +}; + +struct kfd_ioctl_pc_sample_args { + __u64 sample_info_ptr; /* array of kfd_pc_sample_info */ + __u32 num_sample_info; + __u32 op;/* kfd_ioctl_pc_sample_op */ + __u32 gpu_id; + __u32 trace_id; +}; + #define AMDKFD_IOCTL_BASE 'K' #define AMDKFD_IO(nr) _IO(AMDKFD_IOCTL_BASE, nr) #define AMDKFD_IOR(nr, type) _IOR(AMDKFD_IOCTL_BASE, nr, type) @@ -1566,7 +1618,10 @@ struct kfd_ioctl_dbg_trap_args { #define AMDKFD_IOC_DBG_TRAP\ AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args) +#define AMDKFD_IOC_PC_SAMPLE \ + AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args) + #define AMDKFD_COMMAND_START 0x01 -#define AMDKFD_COMMAND_END 0x27 +#define AMDKFD_COMMAND_END 0x28 #endif -- 2.25.1
[PATCH v2 02/23] drm/amdkfd: add pc sampling support
From: David Yat Sin Add pc sampling functions in amdkfd. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/Makefile | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 +++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 78 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 34 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 13 5 files changed, 171 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile index a5ae7bcf44eb..790fd028a681 100644 --- a/drivers/gpu/drm/amd/amdkfd/Makefile +++ b/drivers/gpu/drm/amd/amdkfd/Makefile @@ -57,7 +57,8 @@ AMDKFD_FILES := $(AMDKFD_PATH)/kfd_module.o \ $(AMDKFD_PATH)/kfd_int_process_v11.o \ $(AMDKFD_PATH)/kfd_smi_events.o \ $(AMDKFD_PATH)/kfd_crat.o \ - $(AMDKFD_PATH)/kfd_debug.o + $(AMDKFD_PATH)/kfd_debug.o \ + $(AMDKFD_PATH)/kfd_pc_sampling.o ifneq ($(CONFIG_DEBUG_FS),) AMDKFD_FILES += $(AMDKFD_PATH)/kfd_debugfs.o diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index f6d4748c1980..1a3a8ded9c93 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -41,6 +41,7 @@ #include "kfd_priv.h" #include "kfd_device_queue_manager.h" #include "kfd_svm.h" +#include "kfd_pc_sampling.h" #include "amdgpu_amdkfd.h" #include "kfd_smi_events.h" #include "amdgpu_dma_buf.h" @@ -1750,6 +1751,38 @@ static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data) } #endif +static int kfd_ioctl_pc_sample(struct file *filep, + struct kfd_process *p, void __user *data) +{ + struct kfd_ioctl_pc_sample_args *args = data; + struct kfd_process_device *pdd; + int ret; + + if (sched_policy == KFD_SCHED_POLICY_NO_HWS) { + pr_err("PC Sampling does not support sched_policy %i", sched_policy); + return -EINVAL; + } + + mutex_lock(>mutex); + pdd = kfd_process_device_data_by_id(p, args->gpu_id); + + if (!pdd) { + pr_debug("could not find gpu id 0x%x.", args->gpu_id); + ret = -EINVAL; + } else { + pdd = kfd_bind_process_to_device(pdd->dev, p); + if (IS_ERR(pdd)) { + pr_debug("failed to bind process %p with gpu id 0x%x", p, args->gpu_id); + ret = -ESRCH; + } else { + ret = kfd_pc_sample(pdd, args); + } + } + mutex_unlock(>mutex); + + return ret; +} + static int criu_checkpoint_process(struct kfd_process *p, uint8_t __user *user_priv_data, uint64_t *priv_offset) @@ -3224,6 +3257,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = { AMDKFD_IOCTL_DEF(AMDKFD_IOC_DBG_TRAP, kfd_ioctl_set_debug_trap, 0), + + AMDKFD_IOCTL_DEF(AMDKFD_IOC_PC_SAMPLE, + kfd_ioctl_pc_sample, KFD_IOC_FLAG_PERFMON), }; #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls) @@ -3300,6 +3336,14 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) } } + /* PC Sampling Monitor */ + if (unlikely(ioctl->flags & KFD_IOC_FLAG_PERFMON)) { + if (!capable(CAP_PERFMON) && !capable(CAP_SYS_ADMIN)) { + retcode = -EACCES; + goto err_i1; + } + } + if (cmd & (IOC_IN | IOC_OUT)) { if (asize <= sizeof(stack_kdata)) { kdata = stack_kdata; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c new file mode 100644 index ..a7e78ff42d07 --- /dev/null +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -0,0 +1,78 @@ +// SPDX-License-Identifier: GPL-2.0 OR MIT +/* + * Copyright 2023 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission
[PATCH v2 00/23] Support Host Trap Sampling for gfx941/gfx942
PC sampling is a form of software profiling, where the threads of an application are periodically interrupted and the program counter that the threads are currently attempting to execute is saved out for profiling. David Yat Sin (4): drm/amdkfd/kfd_ioctl: add pc sampling support drm/amdkfd: add pc sampling support drm/amdkfd: enable pc sampling query drm/amdkfd: enable pc sampling create James Zhu (19): drm/amdkfd: add pc sampling mutex drm/amdkfd: add trace_id return drm/amdkfd: check pcs_enrty valid drm/amdkfd: enable pc sampling destroy drm/amdkfd: add interface to trigger pc sampling trap drm/amdkfd: trigger pc sampling trap for gfx v9 drm/amdkfd/gfx9: enable host trap drm/amdgpu: use trapID 4 for host trap drm/amdgpu: add sq host trap status check drm/amdkfd: trigger pc sampling trap for arcturus drm/amdkfd: trigger pc sampling trap for aldebaran drm/amdkfd: use bit operation set debug trap drm/amdkfd: add setting trap pc sampling flag drm/amdkfd: enable pc sampling stop drm/amdkfd: add queue remapping drm/amdkfd: enable pc sampling start drm/amdkfd: add pc sampling thread to trigger trap drm/amdkfd: add pc sampling release when process release drm/amdkfd: bump kfd ioctl minor version for pc sampling availability .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 + .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 14 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 73 + .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 + drivers/gpu/drm/amd/amdkfd/Makefile |3 +- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 + .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 29 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 + drivers/gpu/drm/amd/amdkfd/kfd_device.c | 14 + .../drm/amd/amdkfd/kfd_device_queue_manager.c | 11 + .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 + drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 372 +++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 35 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 43 + drivers/gpu/drm/amd/amdkfd/kfd_process.c | 32 +- .../amd/include/asic_reg/gc/gc_9_0_offset.h |2 + .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h |5 + .../gpu/drm/amd/include/kgd_kfd_interface.h |6 + include/uapi/linux/kfd_ioctl.h| 60 +- 19 files changed, 1813 insertions(+), 1059 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h -- 2.25.1
Re: [PATCH 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support
On 2023-11-27 14:11, Alex Deucher wrote: On Fri, Nov 3, 2023 at 9:22 AM James Zhu wrote: From: David Yat Sin Add pc sampling support in kfd_ioctl. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin For any new IOCTL interfaces, please provide a link to the user mode code branch which uses it in the patch description. [JZ] will add, Thanks! Thanks, Alex --- include/uapi/linux/kfd_ioctl.h | 57 +- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index f0ed68974c54..5202e29c9560 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1446,6 +1446,58 @@ struct kfd_ioctl_dbg_trap_args { }; }; +/** + * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations + * + * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities + * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a per-device PC sampler instance + * @KFD_IOCTL_PCS_OP_DESTROY:Unregister from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_START: Process begins taking samples from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_STOP: Process stops taking samples from a previously registered PC sampler instance + */ +enum kfd_ioctl_pc_sample_op { + KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES, + KFD_IOCTL_PCS_OP_CREATE, + KFD_IOCTL_PCS_OP_DESTROY, + KFD_IOCTL_PCS_OP_START, + KFD_IOCTL_PCS_OP_STOP, +}; + +/* Values have to be a power of 2*/ +#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001 + +enum kfd_ioctl_pc_sample_method { + KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1, + KFD_IOCTL_PCS_METHOD_STOCHASTIC, +}; + +enum kfd_ioctl_pc_sample_type { + KFD_IOCTL_PCS_TYPE_TIME_US, + KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES, + KFD_IOCTL_PCS_TYPE_INSTRUCTIONS +}; + +struct kfd_pc_sample_info { + __u64 value; /* [IN] if PCS_TYPE_INTERVAL_US: sample interval in us + * if PCS_TYPE_CLOCK_CYCLES: sample interval in graphics core clk cycles + * if PCS_TYPE_INSTRUCTIONS: sample interval in instructions issued by + * graphics compute units + */ + __u64 value_min; /* [OUT] */ + __u64 value_max; /* [OUT] */ + __u64 flags; /* [OUT] indicate potential restrictions e.g FLAG_POWER_OF_2 */ + __u32 method;/* [IN/OUT] kfd_ioctl_pc_sample_method */ + __u32 type; /* [IN/OUT] kfd_ioctl_pc_sample_type */ +}; + +struct kfd_ioctl_pc_sample_args { + __u64 sample_info_ptr; /* array of kfd_pc_sample_info */ + __u32 num_sample_info; + __u32 op;/* kfd_ioctl_pc_sample_op */ + __u32 gpu_id; + __u32 trace_id; +}; + #define AMDKFD_IOCTL_BASE 'K' #define AMDKFD_IO(nr) _IO(AMDKFD_IOCTL_BASE, nr) #define AMDKFD_IOR(nr, type) _IOR(AMDKFD_IOCTL_BASE, nr, type) @@ -1566,7 +1618,10 @@ struct kfd_ioctl_dbg_trap_args { #define AMDKFD_IOC_DBG_TRAP\ AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args) +#define AMDKFD_IOC_PC_SAMPLE \ + AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args) + #define AMDKFD_COMMAND_START 0x01 -#define AMDKFD_COMMAND_END 0x27 +#define AMDKFD_COMMAND_END 0x28 #endif -- 2.25.1
Re: [PATCH 21/24] drm/amdkfd: add queue remapping
On 2023-11-23 18:01, Felix Kuehling wrote: On 2023-11-23 17:41, Greathouse, Joseph wrote: [Public] -Original Message- From: Zhu, James Sent: Thursday, November 23, 2023 1:49 PM On 2023-11-23 14:02, Felix Kuehling wrote: On 2023-11-23 11:25, James Zhu wrote: On 2023-11-22 17:35, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: Add queue remapping to force the waves in any running processes to complete a CWSR trap. Please add an explanation why this is needed. [JZ] Even though the profiling-enabled bits is turned off, the CWSR trap handlers for some kernels with this process may still in running stage, this will force the waves in any running processes to complete a CWSR trap, and make sure pc sampling is completely stopped with this process. I will add it later. It may be confusing to talk specifically about "CWSR trap handler". There is only one trap handler that is triggered by different events: CWSR, host trap, s_trap instructions, exceptions, etc. When a new trap triggers, it serializes with any currently running trap handler in that wavefront. So it seems that you're using CWSR as a way to ensure that any host trap has completed: CWSR will wait for previous traps to finish before trapping again for CWSR, the HWS firmware waits for CWSR completion and the driver waits for HWS to finish CWSR with a fence on a HIQ QUERY_STATUS packet. Is that correct? [JZ] I think your explanation is more detail. Need Joseph to confirm. Felix, your summary is correct. The reason we are trying to perform a queue unmap/map cycle as part of the PC sampling stop is to prevent the following: 1. A PC sampling request arrives to Wave X, sending it to 1st-level trap handler 2. User thread asks KFD to stop sampling for this process, which leads to kfd_pc_sample_stop() 3. kfd_pc_sample_stop() decrements the sampling refcent. If this is the last process to stop sampling, it stops any further sampling traps from being generated 4. kfd_pc_sample_stop() sets this process's TMA flag to false so waves in the 1st-level trap handler know sampling is disabled 4.1. Wave X may be in 1st-level handler and not yet checked the TMA flag. If so, it will exit the 1st-level handler when it sees flag is false 4.2. Wave X may have already passed the 1st-level TMA flag check and entered the 2nd-level trap handler to do the PC sample 5. kfd_pc_sample_stop() returns, eventually causing ioctl to return, back to user-space 6. Because the stop ioctl has returned, user-land deallocates user-space buffer the 2nd level trap handler uses to output sample data 7. Wave X that was in the 2nd-level handler tries to finish its sample output and writes to the now-freed location, causing a use-after-free Note that Step 3 does not always stop further traps from arriving -- if another process still wants to do sampling, the driver or HW might still send traps to every wave on the device after Step 3. As such, to avoid going into the 2nd-level handler for non-sampled processes, all 1st-level handlers must check their TMA flag to see if they should allow the sample to flow to the 2nd-level handler. By removing the queue from the HW after Step 4, we can be sure that any existing waves from this process that entered the PC sampling 2nd-level handler before Step 4 are done. Any waves that were still in the 1st-level handler at Step 4.1 will be filtered by the TMA flag being set to false. CWSR will wait until they exit. Any waves that were already in the 2nd-level handler (4.2) must complete before the CWSR save will complete and allow this queue removal request to complete. Any waves that enter the 1st-level trap handler after Step 4 won't go into the PC sampling logic in the 2nd-level handler because the TMA flag is set to false. CWSR will wait until they exit. When we then put the queue back on the hardware, any further traps that might show up (e.g. because another process is sampling) will get filtered by the TMA flag. So once the queue removal (and thus CWSR save cycle) has completed, we can be sure that no other traps to this process will try to use its PC sample data buffer, so it's safe to return to user-space and let them potentially free that buffer. I don't know how to summarize this nicely in a comment, but hopefully y'all can figure that out. :) My best summary: We need to ensure that any waves executing the PC sampling part of the trap handler are done before kfd_pc_sample_stop returns, and that no new waves enter that part of the trap handler afterwards. This avoids race conditions that could lead to use-after-free. Unmapping and remapping the queues either waits for the waves to drain, or preempts them with CWSR, which itself executes a trap and waits for previous traps to finish. [JZ] Thanks all! Regards, Felix Thanks, -Joe Regards, Felix Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_mana
Re: [PATCH 07/24] drm/amdkfd: check pcs_enrty valid
On 2023-11-23 15:32, Felix Kuehling wrote: On 2023-11-23 15:18, James Zhu wrote: On 2023-11-22 17:15, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: Check pcs_enrty valid for pc sampling ioctl. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 30 ++-- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 4c9fc48e1a6a..36366c8847de 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -179,6 +179,21 @@ static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_ int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args) { + struct pc_sampling_entry *pcs_entry; + + if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES && + args->op != KFD_IOCTL_PCS_OP_CREATE) { + + mutex_lock(>dev->pcs_data.mutex); + pcs_entry = idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + args->trace_id); + mutex_unlock(>dev->pcs_data.mutex); You need to keep holding the lock while the pcs_entry is still used. That includes any of the kfd_pc_sample_ functions below. Otherwise someone could free it concurrently. It would also simplify the ..._ functions, if they didn't have to worry about the locking themselves. [JZ] pcs_entry is only for this pc sampling process, which has kfd_process->mutex protected here. OK. That's not obvious. I'm also wary about depending too much on the big process lock. We will need to make that locking more granular soon, because it is causing performance issues with multi-threaded processes. [Jz] Let me add some comments on pcs_entry. Regards, Felix Regards, Felix + + if (!pcs_entry || + pcs_entry->pdd != pdd) + return -EINVAL; + } + switch (args->op) { case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: return kfd_pc_sample_query_cap(pdd, args); @@ -187,13 +202,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd, return kfd_pc_sample_create(pdd, args); case KFD_IOCTL_PCS_OP_DESTROY: - return kfd_pc_sample_destroy(pdd, args->trace_id); + if (pcs_entry->enabled) + return -EBUSY; + else + return kfd_pc_sample_destroy(pdd, args->trace_id); case KFD_IOCTL_PCS_OP_START: - return kfd_pc_sample_start(pdd); + if (pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_start(pdd); case KFD_IOCTL_PCS_OP_STOP: - return kfd_pc_sample_stop(pdd); + if (!pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_stop(pdd); } return -EINVAL;
Re: [PATCH 18/24] drm/amdkfd: enable pc sampling start
On 2023-11-23 15:21, Felix Kuehling wrote: On 2023-11-23 15:01, James Zhu wrote: On 2023-11-22 17:27, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: Enable pc sampling start. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 ++ 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 60b29b245db5..33d003ca0093 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -83,9 +83,29 @@ static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, return 0; } -static int kfd_pc_sample_start(struct kfd_process_device *pdd) +static int kfd_pc_sample_start(struct kfd_process_device *pdd, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + bool pc_sampling_start = false; + + pcs_entry->enabled = true; + mutex_lock(>dev->pcs_data.mutex); + if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) + pc_sampling_start = true; + pdd->dev->pcs_data.hosttrap_entry.base.active_count++; + mutex_unlock(>dev->pcs_data.mutex); + + while (pc_sampling_start) { + if (READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) { + usleep_range(1000, 2000); I don't understand why you need this synchronization through stop_enable. Why can't you do both the start and stop while holding the mutex? It's just setting a flag in the TMA, so it's not a time-consuming operation, and I don't see any potential for deadlocks. [JZ] for stop, not just set TMA. need wait for current pc sampling completely stop and reset some initial setting. I think that's being obfuscated by how you split up this patch series. Maybe if you squash the queue remapping patch into this one, it would be more obvious what's really happening when you stop sampling and would make it easier to review the synchronization and locking strategy. [JZ] Sure Regards, Felix Regards, Felix + } else { + kfd_process_set_trap_pc_sampling_flag(>qpd, + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true); + break; + } + } + + return 0; } static int kfd_pc_sample_stop(struct kfd_process_device *pdd) @@ -225,7 +245,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (pcs_entry->enabled) return -EALREADY; else - return kfd_pc_sample_start(pdd); + return kfd_pc_sample_start(pdd, pcs_entry); case KFD_IOCTL_PCS_OP_STOP: if (!pcs_entry->enabled) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 6670534f47b8..613910e0d440 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -258,6 +258,8 @@ struct kfd_dev; struct kfd_dev_pc_sampling_data { uint32_t use_count; /* Num of PC sampling sessions */ + uint32_t active_count; /* Num of active sessions */ + bool stop_enable; /* pc sampling stop in process */ struct idr pc_sampling_idr; struct kfd_pc_sample_info pc_sample_info; };
Re: [PATCH 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support
On 2023-11-22 16:14, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: From: David Yat Sin Add pc sampling support in kfd_ioctl. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- include/uapi/linux/kfd_ioctl.h | 57 +- 1 file changed, 56 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index f0ed68974c54..5202e29c9560 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -1446,6 +1446,58 @@ struct kfd_ioctl_dbg_trap_args { }; }; +/** + * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations + * + * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities + * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a per-device PC sampler instance + * @KFD_IOCTL_PCS_OP_DESTROY: Unregister from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_START: Process begins taking samples from a previously registered PC sampler instance + * @KFD_IOCTL_PCS_OP_STOP: Process stops taking samples from a previously registered PC sampler instance + */ +enum kfd_ioctl_pc_sample_op { + KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES, + KFD_IOCTL_PCS_OP_CREATE, + KFD_IOCTL_PCS_OP_DESTROY, + KFD_IOCTL_PCS_OP_START, + KFD_IOCTL_PCS_OP_STOP, +}; + +/* Values have to be a power of 2*/ +#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001 + +enum kfd_ioctl_pc_sample_method { + KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1, + KFD_IOCTL_PCS_METHOD_STOCHASTIC, +}; + +enum kfd_ioctl_pc_sample_type { + KFD_IOCTL_PCS_TYPE_TIME_US, + KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES, + KFD_IOCTL_PCS_TYPE_INSTRUCTIONS +}; + +struct kfd_pc_sample_info { + __u64 value; /* [IN] if PCS_TYPE_INTERVAL_US: sample interval in us + * if PCS_TYPE_CLOCK_CYCLES: sample interval in graphics core clk cycles + * if PCS_TYPE_INSTRUCTIONS: sample interval in instructions issued by + * graphics compute units I'd call this "interval". That's still generic enough to be a sampling interval in a unit that depends on the PCS type. "value" is misleading, because it sounds like it may be an actual sample. [JZ] I am fine this interface name changes, + */ + __u64 value_min; /* [OUT] */ + __u64 value_max; /* [OUT] */ interval_min/max. Regards, Felix + __u64 flags; /* [OUT] indicate potential restrictions e.g FLAG_POWER_OF_2 */ + __u32 method; /* [IN/OUT] kfd_ioctl_pc_sample_method */ + __u32 type; /* [IN/OUT] kfd_ioctl_pc_sample_type */ +}; + +struct kfd_ioctl_pc_sample_args { + __u64 sample_info_ptr; /* array of kfd_pc_sample_info */ + __u32 num_sample_info; + __u32 op; /* kfd_ioctl_pc_sample_op */ + __u32 gpu_id; + __u32 trace_id; +}; + #define AMDKFD_IOCTL_BASE 'K' #define AMDKFD_IO(nr) _IO(AMDKFD_IOCTL_BASE, nr) #define AMDKFD_IOR(nr, type) _IOR(AMDKFD_IOCTL_BASE, nr, type) @@ -1566,7 +1618,10 @@ struct kfd_ioctl_dbg_trap_args { #define AMDKFD_IOC_DBG_TRAP \ AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args) +#define AMDKFD_IOC_PC_SAMPLE \ + AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args) + #define AMDKFD_COMMAND_START 0x01 -#define AMDKFD_COMMAND_END 0x27 +#define AMDKFD_COMMAND_END 0x28 #endif
Re: [PATCH 05/24] drm/amdkfd: enable pc sampling create
On 2023-11-22 16:51, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: From: David Yat Sin Enable pc sampling create. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 10 2 files changed, 63 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 49fecbc7013e..f0d910ee730c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -97,7 +97,59 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd) static int kfd_pc_sample_create(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { - return -EINVAL; + struct kfd_pc_sample_info *supported_format = NULL; + struct kfd_pc_sample_info user_info; + int ret; + int i; + + if (user_args->num_sample_info != 1) + return -EINVAL; + + ret = copy_from_user(_info, (void __user *) user_args->sample_info_ptr, + sizeof(struct kfd_pc_sample_info)); + if (ret) { + pr_debug("Failed to copy PC sampling info from user\n"); + return -EFAULT; + } + + for (i = 0; i < ARRAY_SIZE(supported_formats); i++) { + if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version + && user_info.method == supported_formats[i].sample_info->method + && user_info.type == supported_formats[i].sample_info->type + && user_info.value <= supported_formats[i].sample_info->value_max + && user_info.value >= supported_formats[i].sample_info->value_min) { + supported_format = + (struct kfd_pc_sample_info *)supported_formats[i].sample_info; + break; + } + } + + if (!supported_format) { + pr_debug("Sampling format is not supported!"); + return -EOPNOTSUPP; + } + + mutex_lock(>dev->pcs_data.mutex); + if (pdd->dev->pcs_data.hosttrap_entry.base.use_count && + memcmp(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, + _info, sizeof(user_info))) { I think you can compare structures in C. This would be more readable: if (pdd->dev->pcs_data.hosttrap_entry.base.use_count && pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info != user_info) { ... } [JZ[ Sure + ret = copy_to_user((void __user *) user_args->sample_info_ptr, + >dev->pcs_data.hosttrap_entry.base.pc_sample_info, + sizeof(struct kfd_pc_sample_info)); + mutex_unlock(>dev->pcs_data.mutex); + return ret ? ret : -EEXIST; When copy_to_user fails, it returns the number of bytes not copied. That's not a useful return value here. This should be return ret ? -EFAULT : -EEXIST; Also -EBUSY may be more appropriate than -EEXIST. [JZ[ Sure + } + + /* TODO: add trace_id return */ + + if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) + memcpy(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, + _info, sizeof(user_info)); I think you can assign structures in C. Just do pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = user_info; [JZ[ Sure Regards, Felix + + pdd->dev->pcs_data.hosttrap_entry.base.use_count++; + mutex_unlock(>dev->pcs_data.mutex); + + return 0; } static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_id) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 4a0b66189c67..81c925fb2952 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -256,9 +256,19 @@ struct kfd_vmid_info { struct kfd_dev; +struct kfd_dev_pc_sampling_data { + uint32_t use_count; /* Num of PC sampling sessions */ + struct kfd_pc_sample_info pc_sample_info; +}; + +struct kfd_dev_pcs_hosttrap { + struct kfd_dev_pc_sampling_data base; +}; + /* Per device PC Sampling data */ struct kfd_dev_pc_sampling { struct mutex mutex; + struct kfd_dev_pcs_hosttrap hosttrap_entry; }; struct kfd_node {
Re: [PATCH 06/24] drm/amdkfd: add trace_id return
On 2023-11-22 16:56, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: Add trace_id return for new pc sampling creation per device, Use IDR to quickly locate pc_sampling_entry for reference. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 6 ++ 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0e24e011f66b..bcaeedac8fe0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev) static void kfd_pc_sampling_init(struct kfd_node *dev) { mutex_init(>pcs_data.mutex); + idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1); } static void kfd_pc_sampling_exit(struct kfd_node *dev) { + idr_destroy(>pcs_data.hosttrap_entry.base.pc_sampling_idr); mutex_destroy(>pcs_data.mutex); } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index f0d910ee730c..4c9fc48e1a6a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -99,6 +99,7 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, { struct kfd_pc_sample_info *supported_format = NULL; struct kfd_pc_sample_info user_info; + struct pc_sampling_entry *pcs_entry; int ret; int i; @@ -140,7 +141,19 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, return ret ? ret : -EEXIST; } - /* TODO: add trace_id return */ + pcs_entry = kvzalloc(sizeof(*pcs_entry), GFP_KERNEL); I don't see a reason to use kvzalloc here. You know the size of the structure, so kzalloc should be perfectly fine. [JZ] Sure, will change to kzalloc + if (!pcs_entry) { + mutex_unlock(>dev->pcs_data.mutex); + return -ENOMEM; + } + + i = idr_alloc_cyclic(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + pcs_entry, 1, 0, GFP_KERNEL); + if (i < 0) { + mutex_unlock(>dev->pcs_data.mutex); + kvfree(pcs_entry); kfree + return i; + } if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count) memcpy(>dev->pcs_data.hosttrap_entry.base.pc_sample_info, @@ -149,6 +162,11 @@ static int kfd_pc_sample_create(struct kfd_process_device *pdd, pdd->dev->pcs_data.hosttrap_entry.base.use_count++; mutex_unlock(>dev->pcs_data.mutex); + pcs_entry->pdd = pdd; + user_args->trace_id = (uint32_t)i; I suspect this should be done inside the lock. You don't want someone looking up the pcs_entry before it has been initialized. [JZ]pcs_entry is for this pc sampling process, and it has kfd_process->mutex protected, Regards, Felix + + pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", pcs_entry, i, pdd->dev->id); + return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 81c925fb2952..642558026d16 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -258,6 +258,7 @@ struct kfd_dev; struct kfd_dev_pc_sampling_data { uint32_t use_count; /* Num of PC sampling sessions */ + struct idr pc_sampling_idr; struct kfd_pc_sample_info pc_sample_info; }; @@ -743,6 +744,11 @@ enum kfd_pdd_bound { */ #define SDMA_ACTIVITY_DIVISOR 100 +struct pc_sampling_entry { + bool enabled; + struct kfd_process_device *pdd; +}; + /* Data that is per-process-per device. */ struct kfd_process_device { /* The device that owns this data. */
Re: [PATCH 07/24] drm/amdkfd: check pcs_enrty valid
On 2023-11-22 17:15, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: Check pcs_enrty valid for pc sampling ioctl. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 30 ++-- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 4c9fc48e1a6a..36366c8847de 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -179,6 +179,21 @@ static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t trace_ int kfd_pc_sample(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *args) { + struct pc_sampling_entry *pcs_entry; + + if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES && + args->op != KFD_IOCTL_PCS_OP_CREATE) { + + mutex_lock(>dev->pcs_data.mutex); + pcs_entry = idr_find(>dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, + args->trace_id); + mutex_unlock(>dev->pcs_data.mutex); You need to keep holding the lock while the pcs_entry is still used. That includes any of the kfd_pc_sample_ functions below. Otherwise someone could free it concurrently. It would also simplify the ..._ functions, if they didn't have to worry about the locking themselves. [JZ] pcs_entry is only for this pc sampling process, which has kfd_process->mutex protected here. Regards, Felix + + if (!pcs_entry || + pcs_entry->pdd != pdd) + return -EINVAL; + } + switch (args->op) { case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: return kfd_pc_sample_query_cap(pdd, args); @@ -187,13 +202,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd, return kfd_pc_sample_create(pdd, args); case KFD_IOCTL_PCS_OP_DESTROY: - return kfd_pc_sample_destroy(pdd, args->trace_id); + if (pcs_entry->enabled) + return -EBUSY; + else + return kfd_pc_sample_destroy(pdd, args->trace_id); case KFD_IOCTL_PCS_OP_START: - return kfd_pc_sample_start(pdd); + if (pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_start(pdd); case KFD_IOCTL_PCS_OP_STOP: - return kfd_pc_sample_stop(pdd); + if (!pcs_entry->enabled) + return -EALREADY; + else + return kfd_pc_sample_stop(pdd); } return -EINVAL;
Re: [PATCH 18/24] drm/amdkfd: enable pc sampling start
On 2023-11-22 17:27, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: Enable pc sampling start. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 ++ 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 60b29b245db5..33d003ca0093 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -83,9 +83,29 @@ static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, return 0; } -static int kfd_pc_sample_start(struct kfd_process_device *pdd) +static int kfd_pc_sample_start(struct kfd_process_device *pdd, + struct pc_sampling_entry *pcs_entry) { - return -EINVAL; + bool pc_sampling_start = false; + + pcs_entry->enabled = true; + mutex_lock(>dev->pcs_data.mutex); + if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) + pc_sampling_start = true; + pdd->dev->pcs_data.hosttrap_entry.base.active_count++; + mutex_unlock(>dev->pcs_data.mutex); + + while (pc_sampling_start) { + if (READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) { + usleep_range(1000, 2000); I don't understand why you need this synchronization through stop_enable. Why can't you do both the start and stop while holding the mutex? It's just setting a flag in the TMA, so it's not a time-consuming operation, and I don't see any potential for deadlocks. [JZ] for stop, not just set TMA. need wait for current pc sampling completely stop and reset some initial setting. Regards, Felix + } else { + kfd_process_set_trap_pc_sampling_flag(>qpd, + pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true); + break; + } + } + + return 0; } static int kfd_pc_sample_stop(struct kfd_process_device *pdd) @@ -225,7 +245,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd, if (pcs_entry->enabled) return -EALREADY; else - return kfd_pc_sample_start(pdd); + return kfd_pc_sample_start(pdd, pcs_entry); case KFD_IOCTL_PCS_OP_STOP: if (!pcs_entry->enabled) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 6670534f47b8..613910e0d440 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -258,6 +258,8 @@ struct kfd_dev; struct kfd_dev_pc_sampling_data { uint32_t use_count; /* Num of PC sampling sessions */ + uint32_t active_count; /* Num of active sessions */ + bool stop_enable; /* pc sampling stop in process */ struct idr pc_sampling_idr; struct kfd_pc_sample_info pc_sample_info; };
Re: [PATCH 20/24] drm/amdkfd: enable pc sampling work to trigger trap
On 2023-11-23 14:08, Felix Kuehling wrote: On 2023-11-23 13:27, James Zhu wrote: On 2023-11-22 17:31, Felix Kuehling wrote: On 2023-11-03 09:11, James Zhu wrote: Enable a delay work to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 39 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 1 + drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 + 4 files changed, 44 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index bcaeedac8fe0..fb21902e433a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -35,6 +35,7 @@ #include "kfd_migrate.h" #include "amdgpu.h" #include "amdgpu_xcp.h" +#include "kfd_pc_sampling.h" #define MQD_SIZE_ALIGNED 768 @@ -537,6 +538,8 @@ static void kfd_pc_sampling_init(struct kfd_node *dev) { mutex_init(>pcs_data.mutex); idr_init_base(>pcs_data.hosttrap_entry.base.pc_sampling_idr, 1); + INIT_WORK(>pcs_data.hosttrap_entry.base.pc_sampling_work, + kfd_pc_sample_handler); } static void kfd_pc_sampling_exit(struct kfd_node *dev) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index 2c4ac5b4cc4b..e8f0559b618e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c @@ -38,6 +38,43 @@ struct supported_pc_sample_info supported_formats[] = { { IP_VERSION(9, 4, 2), _info_hosttrap_9_0_0 }, }; +void kfd_pc_sample_handler(struct work_struct *work) +{ + struct amdgpu_device *adev; + struct kfd_node *node; + uint32_t timeout = 0; + + node = container_of(work, struct kfd_node, + pcs_data.hosttrap_entry.base.pc_sampling_work); + + mutex_lock(>pcs_data.mutex); + if (node->pcs_data.hosttrap_entry.base.active_count && + node->pcs_data.hosttrap_entry.base.pc_sample_info.value && + node->kfd2kgd->trigger_pc_sample_trap) { + switch (node->pcs_data.hosttrap_entry.base.pc_sample_info.type) { + case KFD_IOCTL_PCS_TYPE_TIME_US: + timeout = (uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.value; + break; + default: + pr_debug("PC Sampling type %d not supported.", + node->pcs_data.hosttrap_entry.base.pc_sample_info.type); + } + } + mutex_unlock(>pcs_data.mutex); + if (!timeout) + return; + + adev = node->adev; + while (!READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable)) { This worker basically runs indefinitely (controlled by user mode). + node->kfd2kgd->trigger_pc_sample_trap(adev, node->vm_info.last_vmid_kfd, + >pcs_data.hosttrap_entry.base.target_simd, + >pcs_data.hosttrap_entry.base.target_wave_slot, + node->pcs_data.hosttrap_entry.base.pc_sample_info.method); + pr_debug_ratelimited("triggered a host trap."); + usleep_range(timeout, timeout + 10); This will cause drift of the interval. Instead what you should do, is calculate the wait time at the end of every iteration based on the current time and the interval. [JZ] I am wondering what degree of accuracy is requested on interval, there is HW time stamp with each pc sampling data packet, + } +} + static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd, struct kfd_ioctl_pc_sample_args __user *user_args) { @@ -101,6 +138,7 @@ static int kfd_pc_sample_start(struct kfd_process_device *pdd, } else { kfd_process_set_trap_pc_sampling_flag(>qpd, pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true); + schedule_work(>dev->pcs_data.hosttrap_entry.base.pc_sampling_work); Scheduling a worker that runs indefinitely on the system workqueue is probably a bad idea. It could block other work items indefinitely. I think you are misusing the work queue API here. What you really want is probably, to crease a kernel thread. [JZ] Yes, you are right. How about use alloc_workqueue to create queue instead of system queue, is alloc_workqueue more efficient than kernel thread creation? A work queue can create many kernel threads to handle the execution of work items. You really only need a single kernel thread per GPU for time-based PC sampling. IMO the work queue just adds a bunch of overhead. Using a work queue for something that runs indefinitely feels like an abuse of the API. I don't have much experience with creating kernel threads directly. See include/linux/kthread.h. If you want to look for an example, it seems drivers/gpu/drm/scheduler uses the kthread API. [JZ] then let me switch to kthread Regards, Felix Regards, Felix break; } } @@