Re: [PATCH] drm/amdkfd: disable SVM for GC 10.1.3/4
We need heavy-weight flushes not just for SVM. If this is broken it will affect ROCm either way. Regards, Felix On 2023-09-07 08:08, Lang Yu wrote: GC 10.1.3/4 have problems with TLB_FLUSH_HEAVYWEIGHT which is used by SVM in svm_range_unmap_from_gpus(). This causes problems on GC 10.1.3/4. Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 7d82c7da223a..dd3db3d88d59 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -992,6 +992,22 @@ static const struct dev_pagemap_ops svm_migrate_pgmap_ops = { /* Each VRAM page uses sizeof(struct page) on system memory */ #define SVM_HMM_PAGE_STRUCT_SIZE(size) ((size)/PAGE_SIZE * sizeof(struct page)) +static inline bool is_zone_device_needed(struct amdgpu_device *adev) +{ + /* Page migration works on gfx9 or newer */ + if (adev->ip_versions[GC_HWIP][0] < IP_VERSION(9, 0, 1)) + return false; + + if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 1, 3) || + adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 1, 4)) + return false; + + if (adev->gmc.is_app_apu) + return false; + + return true; +} + int kgd2kfd_init_zone_device(struct amdgpu_device *adev) { struct amdgpu_kfd_dev *kfddev = >kfd; @@ -1000,11 +1016,7 @@ int kgd2kfd_init_zone_device(struct amdgpu_device *adev) unsigned long size; void *r; - /* Page migration works on gfx9 or newer */ - if (adev->ip_versions[GC_HWIP][0] < IP_VERSION(9, 0, 1)) - return -EINVAL; - - if (adev->gmc.is_app_apu) + if (!is_zone_device_needed(adev)) return 0; pgmap = >pgmap;
Re: [PATCHv3] drm/amdkfd: Fix unaligned 64-bit doorbell warning
On 2023-09-06 11:39, Mukul Joshi wrote: This patch fixes the following unaligned 64-bit doorbell warning seen when submitting packets on HIQ on GFX v9.4.3 by making the HIQ doorbell 64-bit aligned. The warning is seen when GPU is loaded in any mode other than SPX mode. [ +0.000301] [ cut here ] [ +0.03] Unaligned 64-bit doorbell [ +0.30] WARNING: /amdkfd/kfd_doorbell.c:339 write_kernel_doorbell64+0x72/0x80 [ +0.03] RIP: 0010:write_kernel_doorbell64+0x72/0x80 [ +0.04] RSP: 0018:c90004287730 EFLAGS: 00010246 [ +0.05] RAX: RBX: RCX: [ +0.03] RDX: 0001 RSI: 82837c71 RDI: [ +0.03] RBP: c90004287748 R08: 0003 R09: 0001 [ +0.02] R10: 001a R11: 88a034008198 R12: c900013bd004 [ +0.03] R13: 0008 R14: c900042877b0 R15: 007f [ +0.03] FS: 7fa8c7b62000() GS:889f8840() knlGS: [ +0.04] CS: 0010 DS: ES: CR0: 80050033 [ +0.03] CR2: 56111c45aaf0 CR3: 0001414f2002 CR4: 00770ee0 [ +0.03] PKRU: 5554 [ +0.02] Call Trace: [ +0.04] [ +0.06] kq_submit_packet+0x45/0x50 [amdgpu] [ +0.000524] pm_send_set_resources+0x7f/0xc0 [amdgpu] [ +0.000500] set_sched_resources+0xe4/0x160 [amdgpu] [ +0.000503] start_cpsch+0x1c5/0x2a0 [amdgpu] [ +0.000497] kgd2kfd_device_init.cold+0x816/0xb42 [amdgpu] [ +0.000743] amdgpu_amdkfd_device_init+0x15f/0x1f0 [amdgpu] [ +0.000602] amdgpu_device_init.cold+0x1813/0x2176 [amdgpu] [ +0.000684] ? pci_bus_read_config_word+0x4a/0x80 [ +0.12] ? do_pci_enable_device+0xdc/0x110 [ +0.08] amdgpu_driver_load_kms+0x1a/0x110 [amdgpu] [ +0.000545] amdgpu_pci_probe+0x197/0x400 [amdgpu] Fixes: cfeaeb3c0ce7 ("drm/amdgpu: use doorbell mgr for kfd kernel doorbells") Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling --- v1->v2: - Update the logic to make it work with both 32 bit 64 bit doorbells. - Add the Fixed tag v2->v3: - Revert to the original change to align it with whats done in amdgpu_doorbell_index_on_bar. drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c index c2e0b79dcc6d..7b38537c7c99 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c @@ -162,6 +162,7 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd, return NULL; *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev, kfd->doorbells, inx); + inx *= 2; pr_debug("Get kernel queue doorbell\n" " doorbell offset == 0x%08X\n" @@ -176,6 +177,7 @@ void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr) unsigned int inx; inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr); + inx /= 2; mutex_lock(>doorbell_mutex); __clear_bit(inx, kfd->doorbell_bitmap);
Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults
On 2023-08-31 16:33, Chen, Xiaogang wrote: That said, I'm not actually sure why we're freeing the DMA address array after migration to RAM at all. I think we still need it even when we're using VRAM. We call svm_range_dma_map in svm_range_validate_and_map regardless of whether the range is in VRAM or system memory. So it will just allocate a new array the next time the range is validated anyway. VRAM pages use a special address encoding to indicate VRAM pages to the GPUVM code. I think we do not need free DMA address array as you said, it is another thing though. We need unmap dma address(dma_unmap_page) after migrate from ram to vram because we always do dma_map_page at svm_range_validate_and_map. If not we would have multiple dma maps for same sys ram page. svm_range_dma_map_dev calls dma_unmap_page before overwriting an existing valid entry in the dma_addr array. Anyway, dma unmapping the old pages in bulk may still be cleaner. And it avoids delays in cleaning up DMA mappings after migrations. Regards, Felix then we may not need do dma_unmap after migrate from ram to vram since svm_range_dma_map_dev always do dma_unmap_page if the address is valid dma address for sys ram, and after migrate from ram to vram we always do gpu mapping? I think with XNACK enabled, the DMA mapping may be delayed until a page fault. For example on a multi-GPU system, GPU1 page faults and migrates data from system memory to its VRAM. Immediately afterwards, the page fault handler should use svm_validate_and_map to update GPU1 page tables. But GPU2 page tables are not updated immediately. So the now stale DMA mappings for GPU2 would continue to exist until the next page fault on GPU2. Regards, Felix If I understand correctly: when user call svm_range_set_attr, if p->xnack_enabled is true, we can skip call svm_range_validate_and_map. We postpone the buffer validating and gpu mapping until page fault or the time the buffer really got used by a GPU, and only dma map and gpu map for this GPU. The current implementation of svm_range_set_attr skips the validation after migration if XNACK is off, because it is handled by svm_range_restore_work that gets scheduled by the MMU notifier triggered by the migration. With XNACK on, svm_range_set_attr currently validates and maps after migration assuming that the data will be used by the GPU(s) soon. That is something we could change and let page faults take care of the mappings as needed. Regards, Felix Regards Xiaogang
Re: [PATCH v2 1/2] drm/amdgpu: Merge debug module parameters
On 2023-08-30 18:08, André Almeida wrote: Merge all developer debug options available as separated module parameters in one, making it obvious that are for developers. Drop the obsolete module options in favor of the new ones. Signed-off-by: André Almeida --- v2: - drop old module params - use BIT() macros - replace global var with adev-> vars --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 48 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_crat.c| 2 +- drivers/gpu/drm/amd/include/amd_shared.h | 8 8 files changed, 45 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 4de074243c4d..82eaccfce347 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1101,6 +1101,10 @@ struct amdgpu_device { booldc_enabled; /* Mask of active clusters */ uint32_taid_mask; + + /* Debug */ + booldebug_vm; + booldebug_largebar; }; static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index fb78a8f47587..8a26bed76505 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1191,7 +1191,7 @@ static int amdgpu_cs_vm_handling(struct amdgpu_cs_parser *p) job->vm_pd_addr = amdgpu_gmc_pd_addr(vm->root.bo); } - if (amdgpu_vm_debug) { + if (adev->debug_vm) { /* Invalidate all BOs to test for userspace bugs */ amdgpu_bo_list_for_each_entry(e, p->bo_list) { struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index f5856b82605e..0cd48c025433 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -140,7 +140,6 @@ int amdgpu_vm_size = -1; int amdgpu_vm_fragment_size = -1; int amdgpu_vm_block_size = -1; int amdgpu_vm_fault_stop; -int amdgpu_vm_debug; int amdgpu_vm_update_mode = -1; int amdgpu_exp_hw_support; int amdgpu_dc = -1; @@ -194,6 +193,7 @@ int amdgpu_use_xgmi_p2p = 1; int amdgpu_vcnfw_log; int amdgpu_sg_display = -1; /* auto */ int amdgpu_user_partt_mode = AMDGPU_AUTO_COMPUTE_PARTITION_MODE; +uint amdgpu_debug_mask; static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work); @@ -405,13 +405,6 @@ module_param_named(vm_block_size, amdgpu_vm_block_size, int, 0444); MODULE_PARM_DESC(vm_fault_stop, "Stop on VM fault (0 = never (default), 1 = print first, 2 = always)"); module_param_named(vm_fault_stop, amdgpu_vm_fault_stop, int, 0444); -/** - * DOC: vm_debug (int) - * Debug VM handling (0 = disabled, 1 = enabled). The default is 0 (Disabled). - */ -MODULE_PARM_DESC(vm_debug, "Debug VM handling (0 = disabled (default), 1 = enabled)"); -module_param_named(vm_debug, amdgpu_vm_debug, int, 0644); This parameter used to be writable, which means it could be changed through sysfs after loading the module. Code looking at the global variable would see the last value written by user mode. With your changes, this is no longer writable, and driver code is now looking at adev->debug_vm, which cannot be updated through sysfs. As long as everyone is OK with that change, I have no objections. Just pointing it out. Regardless, this patch is Acked-by: Felix Kuehling - /** * DOC: vm_update_mode (int) * Override VM update mode. VM updated by using CPU (0 = never, 1 = Graphics only, 2 = Compute only, 3 = Both). The default @@ -743,18 +736,6 @@ module_param(send_sigterm, int, 0444); MODULE_PARM_DESC(send_sigterm, "Send sigterm to HSA process on unhandled exception (0 = disable, 1 = enable)"); -/** - * DOC: debug_largebar (int) - * Set debug_largebar as 1 to enable simulating large-bar capability on non-large bar - * system. This limits the VRAM size reported to ROCm applications to the visible - * size, usually 256MB. - * Default value is 0, diabled. - */ -int debug_largebar; -module_param(debug_largebar, int, 0444); -MODULE_PARM_DESC(debug_largebar, - "Debug large-bar flag used to simulate large-bar capability on non-large bar machine (0 = disable, 1 = enable)"); - /** * DOC: halt_if_hws_hang (int) * Halt if HWS hang is detected. Default value, 0, disables the halt on hang. @@ -938,6 +919,18 @@ module_param_named(user_partt_mode, amdgpu_user_partt_mo
Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults
On 2023-08-30 19:02, Chen, Xiaogang wrote: On 8/30/2023 3:56 PM, Felix Kuehling wrote: On 2023-08-30 15:39, Chen, Xiaogang wrote: On 8/28/2023 5:37 PM, Felix Kuehling wrote: On 2023-08-28 16:57, Chen, Xiaogang wrote: On 8/28/2023 2:06 PM, Felix Kuehling wrote: On 2023-08-24 18:08, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration in gpu page fault according to migration granularity(default 2MB) and not split svm range in cpu page fault handling. Now a svm range may have pages from both system ram and vram of one gpu. These chagnes are expected to improve migration performance and reduce mmu callback and TLB flush workloads. Signed-off-by: xiaogang chen --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 153 +++ drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 87 - drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 7 +- 4 files changed, 162 insertions(+), 91 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 7d82c7da223a..5a3aa80a1834 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -479,6 +479,8 @@ svm_migrate_vma_to_vram(struct kfd_node *node, struct svm_range *prange, * svm_migrate_ram_to_vram - migrate svm range from system to device * @prange: range structure * @best_loc: the device to migrate to + * @start_mgr: start page to migrate + * @last_mgr: last page to migrate * @mm: the process mm structure * @trigger: reason of migration * @@ -489,6 +491,7 @@ svm_migrate_vma_to_vram(struct kfd_node *node, struct svm_range *prange, */ static int svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, + unsigned long start_mgr, unsigned long last_mgr, struct mm_struct *mm, uint32_t trigger) { unsigned long addr, start, end; @@ -498,9 +501,9 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, unsigned long cpages = 0; long r = 0; - if (prange->actual_loc == best_loc) { - pr_debug("svms 0x%p [0x%lx 0x%lx] already on best_loc 0x%x\n", - prange->svms, prange->start, prange->last, best_loc); + if (!best_loc) { + pr_debug("request svms 0x%p [0x%lx 0x%lx] migrate to sys ram\n", + prange->svms, start_mgr, last_mgr); return 0; } @@ -513,8 +516,8 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, pr_debug("svms 0x%p [0x%lx 0x%lx] to gpu 0x%x\n", prange->svms, prange->start, prange->last, best_loc); - start = prange->start << PAGE_SHIFT; - end = (prange->last + 1) << PAGE_SHIFT; + start = start_mgr << PAGE_SHIFT; + end = (last_mgr + 1) << PAGE_SHIFT; r = svm_range_vram_node_new(node, prange, true); if (r) { @@ -544,10 +547,12 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, if (cpages) { prange->actual_loc = best_loc; - svm_range_free_dma_mappings(prange, true); - } else { + /* only free dma mapping in the migrated range */ + svm_range_free_dma_mappings(prange, true, start_mgr - prange->start, + last_mgr - start_mgr + 1); This is wrong. If we only migrated some of the pages, we should not free the DMA mapping array at all. The array is needed as long as there are any valid DMA mappings in it. yes, I realized it after submit. I can not free DMA mapping array at this stage. The concern(also related to comments below) is I do not know how many pages in vram after partial migration. Originally I used bitmap to record that. I used bitmap to record which pages were migrated at each migration functions. Here I do not need use hmm function to get that info, inside each migration function we can know which pages got migrated, then update the bitmap accordingly inside each migration function. I think the condition above with cpages should be updated. Instead of cpages, we need to keep track of a count of pages in VRAM in struct svm_range. See more below. I think you want add a new integer in svm_range to remember how many pages are in vram side for each svm_range, instead of bitmap. There is a problem I saw: when we need split a prange(such as user uses set_attr api) how do we know how many pages in vram for each splitted prange? Right, that's a bit problematic. But it should be a relatively rare corner case. It may be good enough to make a "pessimistic" assumption when splitting ranges that have some pages in VRAM, that everything is in VRAM. And update that to 0 after migrate_to_ram for the entire range, to allow the BO reference to be released. migrate_to_ram is partial migration too that only 2MB vra
Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults
On 2023-08-30 15:39, Chen, Xiaogang wrote: On 8/28/2023 5:37 PM, Felix Kuehling wrote: On 2023-08-28 16:57, Chen, Xiaogang wrote: On 8/28/2023 2:06 PM, Felix Kuehling wrote: On 2023-08-24 18:08, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration in gpu page fault according to migration granularity(default 2MB) and not split svm range in cpu page fault handling. Now a svm range may have pages from both system ram and vram of one gpu. These chagnes are expected to improve migration performance and reduce mmu callback and TLB flush workloads. Signed-off-by: xiaogang chen --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 153 +++ drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 87 - drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 7 +- 4 files changed, 162 insertions(+), 91 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 7d82c7da223a..5a3aa80a1834 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -479,6 +479,8 @@ svm_migrate_vma_to_vram(struct kfd_node *node, struct svm_range *prange, * svm_migrate_ram_to_vram - migrate svm range from system to device * @prange: range structure * @best_loc: the device to migrate to + * @start_mgr: start page to migrate + * @last_mgr: last page to migrate * @mm: the process mm structure * @trigger: reason of migration * @@ -489,6 +491,7 @@ svm_migrate_vma_to_vram(struct kfd_node *node, struct svm_range *prange, */ static int svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, + unsigned long start_mgr, unsigned long last_mgr, struct mm_struct *mm, uint32_t trigger) { unsigned long addr, start, end; @@ -498,9 +501,9 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, unsigned long cpages = 0; long r = 0; - if (prange->actual_loc == best_loc) { - pr_debug("svms 0x%p [0x%lx 0x%lx] already on best_loc 0x%x\n", - prange->svms, prange->start, prange->last, best_loc); + if (!best_loc) { + pr_debug("request svms 0x%p [0x%lx 0x%lx] migrate to sys ram\n", + prange->svms, start_mgr, last_mgr); return 0; } @@ -513,8 +516,8 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, pr_debug("svms 0x%p [0x%lx 0x%lx] to gpu 0x%x\n", prange->svms, prange->start, prange->last, best_loc); - start = prange->start << PAGE_SHIFT; - end = (prange->last + 1) << PAGE_SHIFT; + start = start_mgr << PAGE_SHIFT; + end = (last_mgr + 1) << PAGE_SHIFT; r = svm_range_vram_node_new(node, prange, true); if (r) { @@ -544,10 +547,12 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, if (cpages) { prange->actual_loc = best_loc; - svm_range_free_dma_mappings(prange, true); - } else { + /* only free dma mapping in the migrated range */ + svm_range_free_dma_mappings(prange, true, start_mgr - prange->start, + last_mgr - start_mgr + 1); This is wrong. If we only migrated some of the pages, we should not free the DMA mapping array at all. The array is needed as long as there are any valid DMA mappings in it. yes, I realized it after submit. I can not free DMA mapping array at this stage. The concern(also related to comments below) is I do not know how many pages in vram after partial migration. Originally I used bitmap to record that. I used bitmap to record which pages were migrated at each migration functions. Here I do not need use hmm function to get that info, inside each migration function we can know which pages got migrated, then update the bitmap accordingly inside each migration function. I think the condition above with cpages should be updated. Instead of cpages, we need to keep track of a count of pages in VRAM in struct svm_range. See more below. I think you want add a new integer in svm_range to remember how many pages are in vram side for each svm_range, instead of bitmap. There is a problem I saw: when we need split a prange(such as user uses set_attr api) how do we know how many pages in vram for each splitted prange? Right, that's a bit problematic. But it should be a relatively rare corner case. It may be good enough to make a "pessimistic" assumption when splitting ranges that have some pages in VRAM, that everything is in VRAM. And update that to 0 after migrate_to_ram for the entire range, to allow the BO reference to be released. migrate_to_ram is partial migration too that only 2MB vram got migrated. After split if we assume all pages are vram(pessimistic) we will give the ne
Re: [PATCHv2] drm/amdkfd: Fix unaligned 64-bit doorbell warning
On 2023-08-30 16:01, Mukul Joshi wrote: This patch fixes the following unaligned 64-bit doorbell warning seen when submitting packets on HIQ on GFX v9.4.3 by making the HIQ doorbell 64-bit aligned. The warning is seen when GPU is loaded in any mode other than SPX mode. [ +0.000301] [ cut here ] [ +0.03] Unaligned 64-bit doorbell [ +0.30] WARNING: /amdkfd/kfd_doorbell.c:339 write_kernel_doorbell64+0x72/0x80 [amdgpu] [ +0.03] RIP: 0010:write_kernel_doorbell64+0x72/0x80 [amdgpu] [ +0.04] RSP: 0018:c90004287730 EFLAGS: 00010246 [ +0.05] RAX: RBX: RCX: [ +0.03] RDX: 0001 RSI: 82837c71 RDI: [ +0.03] RBP: c90004287748 R08: 0003 R09: 0001 [ +0.02] R10: 001a R11: 88a034008198 R12: c900013bd004 [ +0.03] R13: 0008 R14: c900042877b0 R15: 007f [ +0.03] FS: 7fa8c7b62000() GS:889f8840() knlGS: [ +0.04] CS: 0010 DS: ES: CR0: 80050033 [ +0.03] CR2: 56111c45aaf0 CR3: 0001414f2002 CR4: 00770ee0 [ +0.03] PKRU: 5554 [ +0.02] Call Trace: [ +0.04] [ +0.06] kq_submit_packet+0x45/0x50 [amdgpu] [ +0.000524] pm_send_set_resources+0x7f/0xc0 [amdgpu] [ +0.000500] set_sched_resources+0xe4/0x160 [amdgpu] [ +0.000503] start_cpsch+0x1c5/0x2a0 [amdgpu] [ +0.000497] kgd2kfd_device_init.cold+0x816/0xb42 [amdgpu] [ +0.000743] amdgpu_amdkfd_device_init+0x15f/0x1f0 [amdgpu] [ +0.000602] amdgpu_device_init.cold+0x1813/0x2176 [amdgpu] [ +0.000684] ? pci_bus_read_config_word+0x4a/0x80 [ +0.12] ? do_pci_enable_device+0xdc/0x110 [ +0.08] amdgpu_driver_load_kms+0x1a/0x110 [amdgpu] [ +0.000545] amdgpu_pci_probe+0x197/0x400 [amdgpu] Fixes: cfeaeb3c0ce7 ("drm/amdgpu: use doorbell mgr for kfd kernel doorbells") Signed-off-by: Mukul Joshi --- v1->v2: - Update the logic to make it work with both 32 bit 64 bit doorbells. - Add the Fixed tag. drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c index c2e0b79dcc6d..e0d44f4af18e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c @@ -162,6 +162,7 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd, return NULL; *doorbell_off = amdgpu_doorbell_index_on_bar(kfd->adev, kfd->doorbells, inx); + inx *= kfd->device_info.doorbell_size / sizeof(u32); Sorry for going back and forth on this. But you pointed out offline, that amdgpu_doorbell_index_on_bar calculates the doorbell address on the bar by always multiplying with 2. I think we need to do the same thing here for calculating the CPU address of the doorbell. Otherwise the CPU may not write to the same doorbell that the GPU is listening on. In practice this only matters on GPUs that create multiple HIQs. But at least I'd like the driver to be internally consistent and calculate the doorbell addresses the same way in the two addresses spaces. pr_debug("Get kernel queue doorbell\n" " doorbell offset == 0x%08X\n" @@ -175,7 +176,8 @@ void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr) { unsigned int inx; - inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr); + inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr) + * sizeof(u32) / kfd->device_info.doorbell_size; Same as above. Regards, Felix mutex_lock(>doorbell_mutex); __clear_bit(inx, kfd->doorbell_bitmap);
Re: [PATCH] drm/amdkfd: Fix unaligned 64-bit doorbell warning
+Shashank, FYI. I believe this is a regression from your patch "drm/amdgpu: use doorbell mgr for kfd kernel doorbells". On 2023-08-29 12:16, Mukul Joshi wrote: This patch fixes the following unaligned 64-bit doorbell warning seen when submitting packets on HIQ on GFX v9.4.3 by making the HIQ doorbell 64-bit aligned. The warning is seen when GPU is loaded in any mode other than SPX mode. [ +0.000301] [ cut here ] [ +0.03] Unaligned 64-bit doorbell [ +0.30] WARNING: /amdkfd/kfd_doorbell.c:339 write_kernel_doorbell64+0x72/0x80 [amdgpu] [ +0.03] RIP: 0010:write_kernel_doorbell64+0x72/0x80 [amdgpu] [ +0.04] RSP: 0018:c90004287730 EFLAGS: 00010246 [ +0.05] RAX: RBX: RCX: [ +0.03] RDX: 0001 RSI: 82837c71 RDI: [ +0.03] RBP: c90004287748 R08: 0003 R09: 0001 [ +0.02] R10: 001a R11: 88a034008198 R12: c900013bd004 [ +0.03] R13: 0008 R14: c900042877b0 R15: 007f [ +0.03] FS: 7fa8c7b62000() GS:889f8840() knlGS: [ +0.04] CS: 0010 DS: ES: CR0: 80050033 [ +0.03] CR2: 56111c45aaf0 CR3: 0001414f2002 CR4: 00770ee0 [ +0.03] PKRU: 5554 [ +0.02] Call Trace: [ +0.04] [ +0.06] kq_submit_packet+0x45/0x50 [amdgpu] [ +0.000524] pm_send_set_resources+0x7f/0xc0 [amdgpu] [ +0.000500] set_sched_resources+0xe4/0x160 [amdgpu] [ +0.000503] start_cpsch+0x1c5/0x2a0 [amdgpu] [ +0.000497] kgd2kfd_device_init.cold+0x816/0xb42 [amdgpu] [ +0.000743] amdgpu_amdkfd_device_init+0x15f/0x1f0 [amdgpu] [ +0.000602] amdgpu_device_init.cold+0x1813/0x2176 [amdgpu] [ +0.000684] ? pci_bus_read_config_word+0x4a/0x80 [ +0.12] ? do_pci_enable_device+0xdc/0x110 [ +0.08] amdgpu_driver_load_kms+0x1a/0x110 [amdgpu] [ +0.000545] amdgpu_pci_probe+0x197/0x400 [amdgpu] Signed-off-by: Mukul Joshi This should have a Fixes tag: Fixes: cfeaeb3c0ce7 ("drm/amdgpu: use doorbell mgr for kfd kernel doorbells") The original code before that patch used "* sizeof(u32) / kfd->device_info.doorbell_size" instead of "* 2". May be safer to restore the original calculation to have the correct doorbell size on old and new GPUs. Regards, Felix --- drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c index c2e0b79dcc6d..b1c2772c3a8d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c @@ -168,7 +168,7 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd, " doorbell index== 0x%x\n", *doorbell_off, inx); - return kfd->doorbell_kernel_ptr + inx; + return kfd->doorbell_kernel_ptr + inx * 2; } void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr) @@ -176,6 +176,7 @@ void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr) unsigned int inx; inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr); + inx /= 2; mutex_lock(>doorbell_mutex); __clear_bit(inx, kfd->doorbell_bitmap);
Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults
On 2023-08-28 16:57, Chen, Xiaogang wrote: On 8/28/2023 2:06 PM, Felix Kuehling wrote: On 2023-08-24 18:08, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration in gpu page fault according to migration granularity(default 2MB) and not split svm range in cpu page fault handling. Now a svm range may have pages from both system ram and vram of one gpu. These chagnes are expected to improve migration performance and reduce mmu callback and TLB flush workloads. Signed-off-by: xiaogang chen --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 153 +++ drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 87 - drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 7 +- 4 files changed, 162 insertions(+), 91 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 7d82c7da223a..5a3aa80a1834 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -479,6 +479,8 @@ svm_migrate_vma_to_vram(struct kfd_node *node, struct svm_range *prange, * svm_migrate_ram_to_vram - migrate svm range from system to device * @prange: range structure * @best_loc: the device to migrate to + * @start_mgr: start page to migrate + * @last_mgr: last page to migrate * @mm: the process mm structure * @trigger: reason of migration * @@ -489,6 +491,7 @@ svm_migrate_vma_to_vram(struct kfd_node *node, struct svm_range *prange, */ static int svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, + unsigned long start_mgr, unsigned long last_mgr, struct mm_struct *mm, uint32_t trigger) { unsigned long addr, start, end; @@ -498,9 +501,9 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, unsigned long cpages = 0; long r = 0; - if (prange->actual_loc == best_loc) { - pr_debug("svms 0x%p [0x%lx 0x%lx] already on best_loc 0x%x\n", - prange->svms, prange->start, prange->last, best_loc); + if (!best_loc) { + pr_debug("request svms 0x%p [0x%lx 0x%lx] migrate to sys ram\n", + prange->svms, start_mgr, last_mgr); return 0; } @@ -513,8 +516,8 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, pr_debug("svms 0x%p [0x%lx 0x%lx] to gpu 0x%x\n", prange->svms, prange->start, prange->last, best_loc); - start = prange->start << PAGE_SHIFT; - end = (prange->last + 1) << PAGE_SHIFT; + start = start_mgr << PAGE_SHIFT; + end = (last_mgr + 1) << PAGE_SHIFT; r = svm_range_vram_node_new(node, prange, true); if (r) { @@ -544,10 +547,12 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, if (cpages) { prange->actual_loc = best_loc; - svm_range_free_dma_mappings(prange, true); - } else { + /* only free dma mapping in the migrated range */ + svm_range_free_dma_mappings(prange, true, start_mgr - prange->start, + last_mgr - start_mgr + 1); This is wrong. If we only migrated some of the pages, we should not free the DMA mapping array at all. The array is needed as long as there are any valid DMA mappings in it. yes, I realized it after submit. I can not free DMA mapping array at this stage. The concern(also related to comments below) is I do not know how many pages in vram after partial migration. Originally I used bitmap to record that. I used bitmap to record which pages were migrated at each migration functions. Here I do not need use hmm function to get that info, inside each migration function we can know which pages got migrated, then update the bitmap accordingly inside each migration function. I think the condition above with cpages should be updated. Instead of cpages, we need to keep track of a count of pages in VRAM in struct svm_range. See more below. I think you want add a new integer in svm_range to remember how many pages are in vram side for each svm_range, instead of bitmap. There is a problem I saw: when we need split a prange(such as user uses set_attr api) how do we know how many pages in vram for each splitted prange? Right, that's a bit problematic. But it should be a relatively rare corner case. It may be good enough to make a "pessimistic" assumption when splitting ranges that have some pages in VRAM, that everything is in VRAM. And update that to 0 after migrate_to_ram for the entire range, to allow the BO reference to be released. So in the worst case, you keep your DMA addresses and BOs allocated slightly longer than necessary. If that doesn't work, I agree that we need a bitmap with one bit per 4KB page. But I hope that can be avoided. That said, I'm not actually sure w
Re: [PATCH] drm/amdkfd: Use partial migrations in GPU page faults
On 2023-08-24 18:08, Xiaogang.Chen wrote: From: Xiaogang Chen This patch implements partial migration in gpu page fault according to migration granularity(default 2MB) and not split svm range in cpu page fault handling. Now a svm range may have pages from both system ram and vram of one gpu. These chagnes are expected to improve migration performance and reduce mmu callback and TLB flush workloads. Signed-off-by: xiaogang chen --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 153 +++ drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 87 - drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 7 +- 4 files changed, 162 insertions(+), 91 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 7d82c7da223a..5a3aa80a1834 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -479,6 +479,8 @@ svm_migrate_vma_to_vram(struct kfd_node *node, struct svm_range *prange, * svm_migrate_ram_to_vram - migrate svm range from system to device * @prange: range structure * @best_loc: the device to migrate to + * @start_mgr: start page to migrate + * @last_mgr: last page to migrate * @mm: the process mm structure * @trigger: reason of migration * @@ -489,6 +491,7 @@ svm_migrate_vma_to_vram(struct kfd_node *node, struct svm_range *prange, */ static int svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, + unsigned long start_mgr, unsigned long last_mgr, struct mm_struct *mm, uint32_t trigger) { unsigned long addr, start, end; @@ -498,9 +501,9 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, unsigned long cpages = 0; long r = 0; - if (prange->actual_loc == best_loc) { - pr_debug("svms 0x%p [0x%lx 0x%lx] already on best_loc 0x%x\n", -prange->svms, prange->start, prange->last, best_loc); + if (!best_loc) { + pr_debug("request svms 0x%p [0x%lx 0x%lx] migrate to sys ram\n", +prange->svms, start_mgr, last_mgr); return 0; } @@ -513,8 +516,8 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, pr_debug("svms 0x%p [0x%lx 0x%lx] to gpu 0x%x\n", prange->svms, prange->start, prange->last, best_loc); - start = prange->start << PAGE_SHIFT; - end = (prange->last + 1) << PAGE_SHIFT; + start = start_mgr << PAGE_SHIFT; + end = (last_mgr + 1) << PAGE_SHIFT; r = svm_range_vram_node_new(node, prange, true); if (r) { @@ -544,10 +547,12 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, if (cpages) { prange->actual_loc = best_loc; - svm_range_free_dma_mappings(prange, true); - } else { + /* only free dma mapping in the migrated range */ + svm_range_free_dma_mappings(prange, true, start_mgr - prange->start, +last_mgr - start_mgr + 1); This is wrong. If we only migrated some of the pages, we should not free the DMA mapping array at all. The array is needed as long as there are any valid DMA mappings in it. I think the condition above with cpages should be updated. Instead of cpages, we need to keep track of a count of pages in VRAM in struct svm_range. See more below. + } else if (!prange->actual_loc) + /* if all pages from prange are at sys ram */ svm_range_vram_node_free(prange); - } return r < 0 ? r : 0; } @@ -762,6 +767,8 @@ svm_migrate_vma_to_ram(struct kfd_node *node, struct svm_range *prange, * svm_migrate_vram_to_ram - migrate svm range from device to system * @prange: range structure * @mm: process mm, use current->mm if NULL + * @start_mgr: start page need be migrated to sys ram + * @last_mgr: last page need be migrated to sys ram * @trigger: reason of migration * @fault_page: is from vmf->page, svm_migrate_to_ram(), this is CPU page fault callback * @@ -771,7 +778,8 @@ svm_migrate_vma_to_ram(struct kfd_node *node, struct svm_range *prange, * 0 - OK, otherwise error code */ int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm, - uint32_t trigger, struct page *fault_page) + unsigned long start_mgr, unsigned long last_mgr, + uint32_t trigger, struct page *fault_page) { struct kfd_node *node; struct vm_area_struct *vma; @@ -781,23 +789,30 @@ int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm, unsigned long upages = 0; long r = 0; + /* this pragne has no any vram page to migrate to sys ram */ if
Re: [PATCH] drm/amdkfd: Add missing gfx11 MQD manager callbacks
On 2023-08-25 17:30, Harish Kasiviswanathan wrote: From: Jay Cornwall mqd_stride function was introduced in commit 129c7b6a0217 ("drm/amdkfd: Update MQD management on multi XCC setup") but not assigned for gfx11. Fixes a NULL dereference in debugfs. Signed-off-by: Jay Cornwall Signed-off-by: Harish Kasiviswanathan Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c index 2319467d2d95..0bbf0edbabd4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c @@ -457,6 +457,7 @@ struct mqd_manager *mqd_manager_init_v11(enum KFD_MQD_TYPE type, mqd->is_occupied = kfd_is_occupied_cp; mqd->mqd_size = sizeof(struct v11_compute_mqd); mqd->get_wave_state = get_wave_state; + mqd->mqd_stride = kfd_mqd_stride; #if defined(CONFIG_DEBUG_FS) mqd->debugfs_show_mqd = debugfs_show_mqd; #endif @@ -472,6 +473,7 @@ struct mqd_manager *mqd_manager_init_v11(enum KFD_MQD_TYPE type, mqd->destroy_mqd = destroy_hiq_mqd; mqd->is_occupied = kfd_is_occupied_cp; mqd->mqd_size = sizeof(struct v11_compute_mqd); + mqd->mqd_stride = kfd_mqd_stride; #if defined(CONFIG_DEBUG_FS) mqd->debugfs_show_mqd = debugfs_show_mqd; #endif @@ -501,6 +503,7 @@ struct mqd_manager *mqd_manager_init_v11(enum KFD_MQD_TYPE type, mqd->destroy_mqd = kfd_destroy_mqd_sdma; mqd->is_occupied = kfd_is_occupied_sdma; mqd->mqd_size = sizeof(struct v11_sdma_mqd); + mqd->mqd_stride = kfd_mqd_stride; #if defined(CONFIG_DEBUG_FS) mqd->debugfs_show_mqd = debugfs_show_mqd_sdma; #endif
Re: [PATCH] drm/amdkfd: use mask to get v9 interrupt sq data bits correctly
On 2023-08-28 11:35, Alex Sierra wrote: Interrupt sq data bits were not taken properly from contextid0 and contextid1. Use macro KFD_CONTEXT_ID_GET_SQ_INT_DATA instead. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c index f0731a6a5306..830396b1c3b1 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c @@ -384,7 +384,7 @@ static void event_interrupt_wq_v9(struct kfd_node *dev, default: break; } - kfd_signal_event_interrupt(pasid, context_id0 & 0xff, 24); + kfd_signal_event_interrupt(pasid, sq_int_data, 24); } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) { kfd_set_dbg_ev_from_interrupt(dev, pasid, KFD_DEBUG_DOORBELL_ID(context_id0),
Re: [PATCH v2] drm/amdkfd: Replace pr_err with dev_err
On 2023-08-26 09:41, Asad Kamal wrote: Replace pr_err with dev_err to show the bus-id of failing device with kfd queue errors Signed-off-by: Asad Kamal Reviewed-by: Lijo Lazar Reviewed-by: Felix Kuehling --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 116 +++--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 2 files changed, 71 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index b166f30f083e..cd6cfffd6436 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -232,8 +232,8 @@ static int add_queue_mes(struct device_queue_manager *dqm, struct queue *q, queue_type = convert_to_mes_queue_type(q->properties.type); if (queue_type < 0) { - pr_err("Queue type not supported with MES, queue:%d\n", - q->properties.type); + dev_err(adev->dev, "Queue type not supported with MES, queue:%d\n", + q->properties.type); return -EINVAL; } queue_input.queue_type = (uint32_t)queue_type; @@ -244,9 +244,9 @@ static int add_queue_mes(struct device_queue_manager *dqm, struct queue *q, r = adev->mes.funcs->add_hw_queue(>mes, _input); amdgpu_mes_unlock(>mes); if (r) { - pr_err("failed to add hardware queue to MES, doorbell=0x%x\n", + dev_err(adev->dev, "failed to add hardware queue to MES, doorbell=0x%x\n", q->properties.doorbell_off); - pr_err("MES might be in unrecoverable state, issue a GPU reset\n"); + dev_err(adev->dev, "MES might be in unrecoverable state, issue a GPU reset\n"); kfd_hws_hang(dqm); } @@ -272,9 +272,9 @@ static int remove_queue_mes(struct device_queue_manager *dqm, struct queue *q, amdgpu_mes_unlock(>mes); if (r) { - pr_err("failed to remove hardware queue from MES, doorbell=0x%x\n", + dev_err(adev->dev, "failed to remove hardware queue from MES, doorbell=0x%x\n", q->properties.doorbell_off); - pr_err("MES might be in unrecoverable state, issue a GPU reset\n"); + dev_err(adev->dev, "MES might be in unrecoverable state, issue a GPU reset\n"); kfd_hws_hang(dqm); } @@ -284,6 +284,7 @@ static int remove_queue_mes(struct device_queue_manager *dqm, struct queue *q, static int remove_all_queues_mes(struct device_queue_manager *dqm) { struct device_process_node *cur; + struct device *dev = dqm->dev->adev->dev; struct qcm_process_device *qpd; struct queue *q; int retval = 0; @@ -294,7 +295,7 @@ static int remove_all_queues_mes(struct device_queue_manager *dqm) if (q->properties.is_active) { retval = remove_queue_mes(dqm, q, qpd); if (retval) { - pr_err("%s: Failed to remove queue %d for dev %d", + dev_err(dev, "%s: Failed to remove queue %d for dev %d", __func__, q->properties.queue_id, dqm->dev->id); @@ -443,6 +444,7 @@ static int allocate_vmid(struct device_queue_manager *dqm, struct qcm_process_device *qpd, struct queue *q) { + struct device *dev = dqm->dev->adev->dev; int allocated_vmid = -1, i; for (i = dqm->dev->vm_info.first_vmid_kfd; @@ -454,7 +456,7 @@ static int allocate_vmid(struct device_queue_manager *dqm, } if (allocated_vmid < 0) { - pr_err("no more vmid to allocate\n"); + dev_err(dev, "no more vmid to allocate\n"); return -ENOSPC; } @@ -510,10 +512,12 @@ static void deallocate_vmid(struct device_queue_manager *dqm, struct qcm_process_device *qpd, struct queue *q) { + struct device *dev = dqm->dev->adev->dev; + /* On GFX v7, CP doesn't flush TC at dequeue */ if (q->device->adev->asic_type == CHIP_HAWAII) if (flush_texture_cache_nocpsch(q->device, qpd)) - pr_err("Failed to flush TC\n"); + dev_err(dev, "Failed to flush TC\n"); kfd_flush_tlb(qpd_to_pdd(qpd), TLB_FLUSH_LEGACY); @@ -708
Re: [PATCH AUTOSEL 5.15 6/6] drm/amdkfd: ignore crat by default
On 2023-08-22 11:41, Deucher, Alexander wrote: [Public] -Original Message- From: Sasha Levin Sent: Tuesday, August 22, 2023 7:37 AM To: linux-ker...@vger.kernel.org; sta...@vger.kernel.org Cc: Deucher, Alexander ; Kuehling, Felix ; Koenig, Christian ; Mike Lothian ; Sasha Levin ; Pan, Xinhui ; airl...@gmail.com; dan...@ffwll.ch; amd- g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org Subject: [PATCH AUTOSEL 5.15 6/6] drm/amdkfd: ignore crat by default From: Alex Deucher [ Upstream commit a6dea2d64ff92851e68cd4e20a35f6534286e016 ] We are dropping the IOMMUv2 path, so no need to enable this. It's often buggy on consumer platforms anyway. This is not needed for stable. I agree. I was about to comment in the 5.10 patch as well. Regards, Felix Alex Reviewed-by: Felix Kuehling Acked-by: Christian König Tested-by: Mike Lothian Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index e574aa32a111d..46dfd9baeb013 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1523,11 +1523,7 @@ static bool kfd_ignore_crat(void) if (ignore_crat) return true; -#ifndef KFD_SUPPORT_IOMMU_V2 ret = true; -#else - ret = false; -#endif return ret; } -- 2.40.1
Re: [PATCH] drm/amdgpu: Rework memory limits to allow big allocations
On 2023-08-22 9:49, Bhardwaj, Rajneesh wrote: On 8/21/2023 4:32 PM, Felix Kuehling wrote: On 2023-08-21 15:20, Rajneesh Bhardwaj wrote: Rework the KFD max system memory and ttm limit to allow bigger system memory allocations upto 63/64 of the available memory which is controlled by ttm module params pages_limit and page_pool_size. Also for NPS1 mode, report the max ttm limit as the available VRAM size. For max system memory limit, leave 1GB exclusively outside ROCm allocations i.e. on 16GB system, >14 GB can be used by ROCm still leaving some memory for other system applications and on 128GB systems (e.g. GFXIP 9.4.3 APU in NPS1 mode) nearly >120GB can be used by ROCm. Signed-off-by: Rajneesh Bhardwaj --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 ++-- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 25 +-- 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 9e18fe5eb190..3387dcdf1bc9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -44,6 +44,7 @@ * changes to accumulate */ #define AMDGPU_USERPTR_RESTORE_DELAY_MS 1 +#define ONE_GB (1UL << 30) /* * Align VRAM availability to 2MB to avoid fragmentation caused by 4K allocations in the tail 2MB @@ -117,11 +118,11 @@ void amdgpu_amdkfd_gpuvm_init_mem_limits(void) return; si_meminfo(); - mem = si.freeram - si.freehigh; + mem = si.totalram - si.totalhigh; mem *= si.mem_unit; spin_lock_init(_mem_limit.mem_limit_lock); - kfd_mem_limit.max_system_mem_limit = mem - (mem >> 4); + kfd_mem_limit.max_system_mem_limit = mem - (mem >> 6) - (ONE_GB); I believe this is an OK heuristic for large systems and medium-sized systems. But it produces a negative number or an underflow for systems with very small system memory (about 1.1GB). It's not practical to run ROCm on such a small system, but the code at least needs to be robust here and produce something meaningful. E.g. Sure, I agree. kfd_mem_limit.max_system_mem_limit = mem - (mem >> 6); if (kfd_mem_limit.max_system_mem_limit < 2 * ONE_GB) kfd_mem_limit.max_system_mem_limit <<= 1; else kfd_mem_limit.max_system_mem_limit -= ONE_GB; Since this change affects all GPUs and the change below is specific to GFXv9.4.3 APUs, I'd separate this into two patches. Ok, will split into two changes. kfd_mem_limit.max_ttm_mem_limit = ttm_tt_pages_limit() << PAGE_SHIFT; pr_debug("Kernel memory limit %lluM, TTM limit %lluM\n", (kfd_mem_limit.max_system_mem_limit >> 20), diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 8447fcada8bb..4962e35df617 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -25,6 +25,7 @@ #include #include +#include #include "amdgpu.h" #include "gmc_v9_0.h" @@ -1877,6 +1878,7 @@ static void gmc_v9_0_init_acpi_mem_ranges(struct amdgpu_device *adev, struct amdgpu_mem_partition_info *mem_ranges) { + uint64_t max_ttm_size = ttm_tt_pages_limit() << PAGE_SHIFT; int num_ranges = 0, ret, mem_groups; struct amdgpu_numa_info numa_info; int node_ids[MAX_MEM_RANGES]; @@ -1913,8 +1915,17 @@ gmc_v9_0_init_acpi_mem_ranges(struct amdgpu_device *adev, /* If there is only partition, don't use entire size */ if (adev->gmc.num_mem_partitions == 1) { - mem_ranges[0].size = mem_ranges[0].size * (mem_groups - 1); - do_div(mem_ranges[0].size, mem_groups); + if (max_ttm_size > mem_ranges[0].size || max_ttm_size <= 0) { This gives some weird dis-continuous behaviour. For max_ttm_size > mem_ranges[0].size it gives you 3/4. For max_ttm_size == mem_ranges[0].size it gives you all the memory. Also, why is this only applied for num_mem_partitions == 1? The TTM limit also applies when there are more memory partitions. Would it make more sense to always evenly divide the ttm_tt_pages_limit between all the memory partitions? And cap the size at the NUMA node size. I think that would eliminate special cases for different memory-partition configs and give you sensible behaviour in all cases. I think TTM doesn't check what values are being passed to pages_limt or page_pool_size so when the user passes an arbitrary number here, I wanted to retain the default behavior for NPS1 mode i.e. 3/4th of the available NUMA memory should be reported as VRAM. Also for >NPS1 mode, the partition size is already proportionately divided i.e in TPX/NPS4 mode, we have 1/4th NUMA memory visible as VRAM but KFD limits will be already bigger than that and we will be capped by VRAM size so this
Re: [PATCH] drm/amdgpu: Use READ_ONCE() when reading the values in 'sdma_v4_4_2_ring_get_rptr'
Would it make sense to include a link to a better explanation of the underlying issue? E.g. https://lwn.net/Articles/624126/? Regards, Felix On 2023-08-21 07:23, Christian König wrote: Am 04.08.23 um 07:46 schrieb Srinivasan Shanmugam: Instead of declaring pointers use READ_ONCE(), when accessing those values to make sure that the compiler doesn't voilate any cache coherences That commit message is a bit confusing and not 100% technically correct. The compiler is not causing any cache coherency issues, but potentially re-ordering things or reading the value multiple times. Just write something like "Use READ_ONCE() instead of declaring the pointer volatile.". The background explanation would exceed the information suitable for a commit message anyway. Apart from that looks good to me, Christian. Cc: Guchun Chen Cc: Christian König Cc: Alex Deucher Cc: "Pan, Xinhui" Cc: Le Ma Cc: Hawking Zhang Signed-off-by: Srinivasan Shanmugam --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c index f413898dda37..267c1b7b8dcd 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c @@ -154,13 +154,13 @@ static int sdma_v4_4_2_init_microcode(struct amdgpu_device *adev) */ static uint64_t sdma_v4_4_2_ring_get_rptr(struct amdgpu_ring *ring) { - u64 *rptr; + u64 rptr; /* XXX check if swapping is necessary on BE */ - rptr = ((u64 *)>adev->wb.wb[ring->rptr_offs]); + rptr = READ_ONCE(*((u64 *)>adev->wb.wb[ring->rptr_offs])); - DRM_DEBUG("rptr before shift == 0x%016llx\n", *rptr); - return ((*rptr) >> 2); + DRM_DEBUG("rptr before shift == 0x%016llx\n", rptr); + return rptr >> 2; } /**
Re: [PATCH] drm/amdkfd: Share the original BO for GTT mapping
On 2023-08-21 15:29, Philip Yang wrote: If mGPUs is on same IOMMU group, or is ram direct mapped, then mGPUs can share the original BO for GTT mapping dma address, without creating new BO from export/import dmabuf. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 282879c3441a..b5b940485059 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -864,9 +864,10 @@ static int kfd_mem_attach(struct amdgpu_device *adev, struct kgd_mem *mem, if ((adev == bo_adev && !(mem->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)) || (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) && reuse_dmamap(adev, bo_adev)) || - same_hive) { + (mem->domain == AMDGPU_GEM_DOMAIN_GTT && reuse_dmamap(adev, bo_adev)) || + same_hive) { /* Mappings on the local GPU, or VRAM mappings in the -* local hive, or userptr mapping can reuse dma map +* local hive, or userptr, or GTT mapping can reuse dma map * address space share the original BO */ attachment[i]->type = KFD_MEM_ATT_SHARED;
Re: [PATCH] drm/amdgpu: Rework memory limits to allow big allocations
On 2023-08-21 15:20, Rajneesh Bhardwaj wrote: Rework the KFD max system memory and ttm limit to allow bigger system memory allocations upto 63/64 of the available memory which is controlled by ttm module params pages_limit and page_pool_size. Also for NPS1 mode, report the max ttm limit as the available VRAM size. For max system memory limit, leave 1GB exclusively outside ROCm allocations i.e. on 16GB system, >14 GB can be used by ROCm still leaving some memory for other system applications and on 128GB systems (e.g. GFXIP 9.4.3 APU in NPS1 mode) nearly >120GB can be used by ROCm. Signed-off-by: Rajneesh Bhardwaj --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 ++-- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 25 +-- 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 9e18fe5eb190..3387dcdf1bc9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -44,6 +44,7 @@ * changes to accumulate */ #define AMDGPU_USERPTR_RESTORE_DELAY_MS 1 +#define ONE_GB (1UL << 30) /* * Align VRAM availability to 2MB to avoid fragmentation caused by 4K allocations in the tail 2MB @@ -117,11 +118,11 @@ void amdgpu_amdkfd_gpuvm_init_mem_limits(void) return; si_meminfo(); - mem = si.freeram - si.freehigh; + mem = si.totalram - si.totalhigh; mem *= si.mem_unit; spin_lock_init(_mem_limit.mem_limit_lock); - kfd_mem_limit.max_system_mem_limit = mem - (mem >> 4); + kfd_mem_limit.max_system_mem_limit = mem - (mem >> 6) - (ONE_GB); I believe this is an OK heuristic for large systems and medium-sized systems. But it produces a negative number or an underflow for systems with very small system memory (about 1.1GB). It's not practical to run ROCm on such a small system, but the code at least needs to be robust here and produce something meaningful. E.g. kfd_mem_limit.max_system_mem_limit = mem - (mem >> 6); if (kfd_mem_limit.max_system_mem_limit < 2 * ONE_GB) kfd_mem_limit.max_system_mem_limit <<= 1; else kfd_mem_limit.max_system_mem_limit -= ONE_GB; Since this change affects all GPUs and the change below is specific to GFXv9.4.3 APUs, I'd separate this into two patches. kfd_mem_limit.max_ttm_mem_limit = ttm_tt_pages_limit() << PAGE_SHIFT; pr_debug("Kernel memory limit %lluM, TTM limit %lluM\n", (kfd_mem_limit.max_system_mem_limit >> 20), diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 8447fcada8bb..4962e35df617 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -25,6 +25,7 @@ #include #include +#include #include "amdgpu.h" #include "gmc_v9_0.h" @@ -1877,6 +1878,7 @@ static void gmc_v9_0_init_acpi_mem_ranges(struct amdgpu_device *adev, struct amdgpu_mem_partition_info *mem_ranges) { + uint64_t max_ttm_size = ttm_tt_pages_limit() << PAGE_SHIFT; int num_ranges = 0, ret, mem_groups; struct amdgpu_numa_info numa_info; int node_ids[MAX_MEM_RANGES]; @@ -1913,8 +1915,17 @@ gmc_v9_0_init_acpi_mem_ranges(struct amdgpu_device *adev, /* If there is only partition, don't use entire size */ if (adev->gmc.num_mem_partitions == 1) { - mem_ranges[0].size = mem_ranges[0].size * (mem_groups - 1); - do_div(mem_ranges[0].size, mem_groups); + if (max_ttm_size > mem_ranges[0].size || max_ttm_size <= 0) { This gives some weird dis-continuous behaviour. For max_ttm_size > mem_ranges[0].size it gives you 3/4. For max_ttm_size == mem_ranges[0].size it gives you all the memory. Also, why is this only applied for num_mem_partitions == 1? The TTM limit also applies when there are more memory partitions. Would it make more sense to always evenly divide the ttm_tt_pages_limit between all the memory partitions? And cap the size at the NUMA node size. I think that would eliminate special cases for different memory-partition configs and give you sensible behaviour in all cases. Regards, Felix + /* Report VRAM as 3/4th of available numa memory */ + mem_ranges[0].size = mem_ranges[0].size * (mem_groups - 1); + do_div(mem_ranges[0].size, mem_groups); + } else { + /* Report VRAM as set by ttm.pages_limit or default ttm +* limit which is 1/2 of system memory +*/ + mem_ranges[0].size = max_ttm_size; + } + pr_debug("NPS1 mode, setting VRAM size = %llu\n", mem_ranges[0].size); } } @@ -2159,6 +2170,11 @@ static
Re: [PATCH] drm/amdkfd: use correct method to get clock under SRIOV
On 2023-08-17 07:08, Horace Chen wrote: [What] Current SRIOV still using adev->clock.default_XX which gets from atomfirmware. But these fields are abandoned in atomfirmware long ago. Which may cause function to return a 0 value. [How] We don't need to check whether SR-IOV. For SR-IOV one-vf-mode, pm is enabled and VF is able to read dpm clock from pmfw, so we can use dpm clock interface directly. For multi-VF mode, VF pm is disabled, so driver can just react as pm disabled. One-vf-mode is introduced from GFX9 so it shall not have any backward compatibility issue. Signed-off-by: Horace Chen Acked-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index df633e9ce920..cdf6087706aa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -442,9 +442,7 @@ void amdgpu_amdkfd_get_local_mem_info(struct amdgpu_device *adev, mem_info->local_mem_size_public, mem_info->local_mem_size_private); - if (amdgpu_sriov_vf(adev)) - mem_info->mem_clk_max = adev->clock.default_mclk / 100; - else if (adev->pm.dpm_enabled) { + if (adev->pm.dpm_enabled) { if (amdgpu_emu_mode == 1) mem_info->mem_clk_max = 0; else @@ -463,9 +461,7 @@ uint64_t amdgpu_amdkfd_get_gpu_clock_counter(struct amdgpu_device *adev) uint32_t amdgpu_amdkfd_get_max_engine_clock_in_mhz(struct amdgpu_device *adev) { /* the sclk is in quantas of 10kHz */ - if (amdgpu_sriov_vf(adev)) - return adev->clock.default_sclk / 100; - else if (adev->pm.dpm_enabled) + if (adev->pm.dpm_enabled) return amdgpu_dpm_get_sclk(adev, false) / 100; else return 100;
Re: [PATCH] drm/amdkfd: retry after EBUSY is returned from hmm_ranges_get_pages
On 2023-08-16 14:44, Alex Sierra wrote: if hmm_range_get_pages returns EBUSY error during svm_range_validate_and_map, within the context of a page fault interrupt. This should retry through svm_range_restore_pages callback. Therefore we treat this as EAGAIN error instead, and defer it to restore pages fallback. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 93609ea42163..3ebd5d99f39e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1685,6 +1685,8 @@ static int svm_range_validate_and_map(struct mm_struct *mm, WRITE_ONCE(p->svms.faulting_task, NULL); if (r) { pr_debug("failed %d to get svm range pages\n", r); + if (r == -EBUSY) + r = -EAGAIN; goto unreserve_out; }
Re: [PATCH] drm/amdkfd: add schedule to remove RCU stall on CPU
If you have a complete kernel log, it may be worth looking at backtraces from other threads, to better understand the interactions. I'd expect that there is a thread there that's in an RCU read critical section. It may not be in our driver, though. If it's a customer system, it may also help to see the kernel config. Maybe the kernel was configured without preemption: - For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the kernel without invoking schedule(). If the looping in the kernel is really expected and desirable behavior, you might need to add some calls to cond_resched(). But then I would expect cond_resched() to fix the problem, according to this document. Regards, Felix On 2023-08-11 17:27, Chen, Xiaogang wrote: On 8/11/2023 4:22 PM, Felix Kuehling wrote: On 2023-08-11 17:12, Chen, Xiaogang wrote: I know the original jira ticket. The system got RCU cpu stall, then kernel enter panic, then no response or ssh. This patch let prange list update task yield cpu after each range update. It can prevent task holding mm lock too long. Calling schedule does not drop the lock. If anything, it causes the lock to be held longer, because the function takes longer to complete. Regards, Felix Right. I do not see either how this patch target the root cause. It is on customer system that can have many RCU operations(not necessary from our code). Any read critical section can cause write stall. I think we can use some RCU parameters first to see if thing can change: like config_rcu_cpu_stall_timeout to increase grace period, or rcuupdate.rcu_cpu_stall_suppress to surppress RCU stall. Regards Xiaogang mm lock is rw_semophore, not RCU mechanism. Can you explain how that can prevent RCU cpu stall in this case? Regards Xiaogang On 8/11/2023 2:11 PM, James Zhu wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. update_list could be big in list_for_each_entry(prange, _list, update_list), mmap_read_lock(mm) is kept hold all the time, adding schedule() can remove RCU stall on CPU for this case. RIP: 0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu] Code: 00 00 00 bf 00 02 00 00 48 81 c2 90 00 00 00 e8 1f 6a b9 e0 65 48 8b 14 25 00 bd 01 00 8b 42 2c 48 8b 3c 24 80 e4 f7 0b 43 d8 <89> 42 2c e8 51 dd 2d e1 48 8b 7b 38 e8 98 29 b7 e0 48 83 c4 30 b8 RSP: 0018:c9000ffd7b10 EFLAGS: 0206 RAX: 0100 RBX: 88c493968d80 RCX: 88d1a6469b18 RDX: 88e18ef1ec80 RSI: c9000ffd7be0 RDI: 88c493968d38 RBP: 0003062e R08: 3042f000 R09: 3062efff R10: 1000 R11: 88c1ad255000 R12: 0003042f R13: 88c493968c00 R14: c9000ffd7be0 R15: 88c493968c00 __mmu_notifier_invalidate_range_start+0x132/0x1d0 ? amdgpu_vm_bo_update+0x3fd/0x520 [amdgpu] migrate_vma_setup+0x6c7/0x8f0 ? kfd_smi_event_migration_start+0x5f/0x80 [amdgpu] svm_migrate_ram_to_vram+0x14e/0x580 [amdgpu] svm_range_set_attr+0xe34/0x11a0 [amdgpu] kfd_ioctl+0x271/0x4e0 [amdgpu] ? kfd_ioctl_set_xnack_mode+0xd0/0xd0 [amdgpu] __x64_sys_ioctl+0x92/0xd0 Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 113fd11aa96e..9f2d48ade7fa 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -3573,6 +3573,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, r = svm_range_trigger_migration(mm, prange, ); if (r) goto out_unlock_range; + schedule(); if (migrated && (!p->xnack_enabled || (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) && -- 2.34.1
Re: [PATCH v3] drm/amdgpu: skip xcp drm device allocation when out of drm resource
On 2023-08-11 17:06, James Zhu wrote: Return 0 when drm device alloc failed with -ENOSPC in order to allow amdgpu drive loading. But the xcp without drm device node assigned won't be visiable in user space. This helps amdgpu driver loading on system which has more than 64 nodes, the current limitation. The proposal to add more drm nodes is discussed in public, which will support up to 2^20 nodes totally. kernel drm: https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiar...@intel.com/T/ libdrm: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305 Signed-off-by: James Zhu Acked-by: Christian König Reviewed-by: Felix Kuehling -v2: added warning message -v3: use dev_warn --- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 13 - drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 +- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index 9c9cca129498..565a1fa436d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -239,8 +239,13 @@ static int amdgpu_xcp_dev_alloc(struct amdgpu_device *adev) for (i = 1; i < MAX_XCP; i++) { ret = amdgpu_xcp_drm_dev_alloc(_ddev); - if (ret) + if (ret == -ENOSPC) { + dev_warn(adev->dev, + "Skip xcp node #%d when out of drm node resource.", i); + return 0; + } else if (ret) { return ret; + } /* Redirect all IOCTLs to the primary device */ adev->xcp_mgr->xcp[i].rdev = p_ddev->render->dev; @@ -328,6 +333,9 @@ int amdgpu_xcp_dev_register(struct amdgpu_device *adev, return 0; for (i = 1; i < MAX_XCP; i++) { + if (!adev->xcp_mgr->xcp[i].ddev) + break; + ret = drm_dev_register(adev->xcp_mgr->xcp[i].ddev, ent->driver_data); if (ret) return ret; @@ -345,6 +353,9 @@ void amdgpu_xcp_dev_unplug(struct amdgpu_device *adev) return; for (i = 1; i < MAX_XCP; i++) { + if (!adev->xcp_mgr->xcp[i].ddev) + break; + p_ddev = adev->xcp_mgr->xcp[i].ddev; drm_dev_unplug(p_ddev); p_ddev->render->dev = adev->xcp_mgr->xcp[i].rdev; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3b0749390388..310df98ba46a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1969,8 +1969,16 @@ int kfd_topology_add_device(struct kfd_node *gpu) int i; const char *asic_name = amdgpu_asic_name[gpu->adev->asic_type]; + gpu_id = kfd_generate_gpu_id(gpu); - pr_debug("Adding new GPU (ID: 0x%x) to topology\n", gpu_id); + if (!gpu->xcp->ddev) { + dev_warn(gpu->adev->dev, + "Won't add GPU (ID: 0x%x) to topology since it has no drm node assigned.", + gpu_id); + return 0; + } else { + pr_debug("Adding new GPU (ID: 0x%x) to topology\n", gpu_id); + } /* Check to see if this gpu device exists in the topology_device_list. * If so, assign the gpu to that device,
Re: [PATCH] drm/amdkfd: add schedule to remove RCU stall on CPU
On 2023-08-11 17:12, Chen, Xiaogang wrote: I know the original jira ticket. The system got RCU cpu stall, then kernel enter panic, then no response or ssh. This patch let prange list update task yield cpu after each range update. It can prevent task holding mm lock too long. Calling schedule does not drop the lock. If anything, it causes the lock to be held longer, because the function takes longer to complete. Regards, Felix mm lock is rw_semophore, not RCU mechanism. Can you explain how that can prevent RCU cpu stall in this case? Regards Xiaogang On 8/11/2023 2:11 PM, James Zhu wrote: Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. update_list could be big in list_for_each_entry(prange, _list, update_list), mmap_read_lock(mm) is kept hold all the time, adding schedule() can remove RCU stall on CPU for this case. RIP: 0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu] Code: 00 00 00 bf 00 02 00 00 48 81 c2 90 00 00 00 e8 1f 6a b9 e0 65 48 8b 14 25 00 bd 01 00 8b 42 2c 48 8b 3c 24 80 e4 f7 0b 43 d8 <89> 42 2c e8 51 dd 2d e1 48 8b 7b 38 e8 98 29 b7 e0 48 83 c4 30 b8 RSP: 0018:c9000ffd7b10 EFLAGS: 0206 RAX: 0100 RBX: 88c493968d80 RCX: 88d1a6469b18 RDX: 88e18ef1ec80 RSI: c9000ffd7be0 RDI: 88c493968d38 RBP: 0003062e R08: 3042f000 R09: 3062efff R10: 1000 R11: 88c1ad255000 R12: 0003042f R13: 88c493968c00 R14: c9000ffd7be0 R15: 88c493968c00 __mmu_notifier_invalidate_range_start+0x132/0x1d0 ? amdgpu_vm_bo_update+0x3fd/0x520 [amdgpu] migrate_vma_setup+0x6c7/0x8f0 ? kfd_smi_event_migration_start+0x5f/0x80 [amdgpu] svm_migrate_ram_to_vram+0x14e/0x580 [amdgpu] svm_range_set_attr+0xe34/0x11a0 [amdgpu] kfd_ioctl+0x271/0x4e0 [amdgpu] ? kfd_ioctl_set_xnack_mode+0xd0/0xd0 [amdgpu] __x64_sys_ioctl+0x92/0xd0 Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 113fd11aa96e..9f2d48ade7fa 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -3573,6 +3573,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, r = svm_range_trigger_migration(mm, prange, ); if (r) goto out_unlock_range; + schedule(); if (migrated && (!p->xnack_enabled || (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) && -- 2.34.1
Re: [PATCH] drm/amdkfd: add schedule to remove RCU stall on CPU
I don't understand why this loop is causing a stall. These stall warnings indicate that there is an RCU grace period that's not making progress. That means there must be an RCU read critical section that's being blocked. But there is no RCU-read critical section in svm_range_set_attr function. You mentioned the mmap-read-lock. But why is that causing an issue? Does it trigger any of the conditions listed in kernel/Documentation/RCU/stallwarn.rst? - A CPU looping in an RCU read-side critical section. - A CPU looping with interrupts disabled. - A CPU looping with preemption disabled. - A CPU looping with bottom halves disabled. Or is there another thread that has an mmap_write_lock inside an RCU read critical section that's getting stalled by the mmap_read_lock? Regards, Felix On 2023-08-11 16:50, James Zhu wrote: On 2023-08-11 16:06, Felix Kuehling wrote: On 2023-08-11 15:11, James Zhu wrote: update_list could be big in list_for_each_entry(prange, _list, update_list), mmap_read_lock(mm) is kept hold all the time, adding schedule() can remove RCU stall on CPU for this case. RIP: 0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu] You're just showing the backtrace here, but not what the problem is. Can you include more context, e.g. the message that says something about a stall? [JZ] I attached more log here, and update in patch later. 2023-07-20T14:15:39-04:00 frontier06693 kernel: rcu: INFO: rcu_sched self-detected stall on CPU 2023-07-20T14:15:39-04:00 frontier06693 kernel: rcu: #01134-: (59947 ticks this GP) idle=7f6/1/0x4000 softirq=1735/1735 fqs=29977 2023-07-20T14:15:39-04:00 frontier06693 kernel: #011(t=60006 jiffies g=3265905 q=15150) 2023-07-20T14:15:39-04:00 frontier06693 kernel: rcu: CPU 34: RCU dump cpu stacks: 2023-07-20T14:15:39-04:00 frontier06693 kernel: NMI backtrace for cpu 34 2023-07-20T14:15:39-04:00 frontier06693 kernel: CPU: 34 PID: 72044 Comm: ncsd-it-hip.exe Kdump: loaded Tainted: G OE 5.14.21-150400.24.46_12.0.83-cray_shasta_c #1 SLE15-SP4 (unreleased) 2023-07-20T14:15:39-04:00 frontier06693 kernel: Hardware name: HPE HPE_CRAY_EX235A/HPE CRAY EX235A, BIOS 1.6.2 03-22-2023 2023-07-20T14:15:39-04:00 frontier06693 kernel: Call Trace: 2023-07-20T14:15:39-04:00 frontier06693 kernel: 2023-07-20T14:15:39-04:00 frontier06693 kernel: dump_stack_lvl+0x44/0x5b 2023-07-20T14:15:39-04:00 frontier06693 kernel: nmi_cpu_backtrace+0xdd/0xe0 2023-07-20T14:15:39-04:00 frontier06693 kernel: ? lapic_can_unplug_cpu+0xa0/0xa0 2023-07-20T14:15:39-04:00 frontier06693 kernel: nmi_trigger_cpumask_backtrace+0xfd/0x130 2023-07-20T14:15:39-04:00 frontier06693 kernel: rcu_dump_cpu_stacks+0x13b/0x180 2023-07-20T14:15:39-04:00 frontier06693 kernel: rcu_sched_clock_irq+0x6cb/0x930 2023-07-20T14:15:39-04:00 frontier06693 kernel: ? trigger_load_balance+0x158/0x390 2023-07-20T14:15:39-04:00 frontier06693 kernel: ? scheduler_tick+0xe1/0x290 2023-07-20T14:15:39-04:00 frontier06693 kernel: update_process_times+0x8c/0xb0 2023-07-20T14:15:39-04:00 frontier06693 kernel: tick_sched_handle.isra.21+0x1d/0x60 2023-07-20T14:15:39-04:00 frontier06693 kernel: ? tick_sched_handle.isra.21+0x60/0x60 2023-07-20T14:15:39-04:00 frontier06693 kernel: tick_sched_timer+0x67/0x80 2023-07-20T14:15:39-04:00 frontier06693 kernel: ? tick_sched_handle.isra.21+0x60/0x60 2023-07-20T14:15:39-04:00 frontier06693 kernel: __hrtimer_run_queues+0xa0/0x2b0 2023-07-20T14:15:39-04:00 frontier06693 kernel: hrtimer_interrupt+0xe5/0x250 2023-07-20T14:15:39-04:00 frontier06693 kernel: __sysvec_apic_timer_interrupt+0x62/0x100 2023-07-20T14:15:39-04:00 frontier06693 kernel: sysvec_apic_timer_interrupt+0x4b/0x90 2023-07-20T14:15:39-04:00 frontier06693 kernel: 2023-07-20T14:15:39-04:00 frontier06693 kernel: 2023-07-20T14:15:39-04:00 frontier06693 kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20 2023-07-20T14:15:39-04:00 frontier06693 kernel: RIP: 0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu] 2023-07-20T14:15:39-04:00 frontier06693 kernel: Code: 00 00 00 bf 00 02 00 00 48 81 c2 90 00 00 00 e8 1f 6a b9 e0 65 48 8b 14 25 00 bd 01 00 8b 42 2c 48 8b 3c 24 80 e4 f7 0b 43 d8 <89> 42 2c e8 51 dd 2d e1 48 8b 7b 38 e8 98 29 b7 e0 48 83 c4 30 b8 2023-07-20T14:15:39-04:00 frontier06693 kernel: RSP: 0018:c9000ffd7b10 EFLAGS: 0206 2023-07-20T14:15:39-04:00 frontier06693 kernel: RAX: 0100 RBX: 88c493968d80 RCX: 88d1a6469b18 2023-07-20T14:15:39-04:00 frontier06693 kernel: RDX: 88e18ef1ec80 RSI: c9000ffd7be0 RDI: 88c493968d38 2023-07-20T14:15:39-04:00 frontier06693 kernel: RBP: 0003062e R08: 3042f000 R09: 3062efff 2023-07-20T14:15:39-04:00 frontier06693 kernel: R10: 1000 R11: 88c1ad255000 R12: 0003042f 2023-07-20T14:15:39-04:00 frontier06693 kernel: R13: 88c493968c00 R14: c9000ffd7be0 R15: 88c493968c00 2023-07-20T14:15:39
Re: [PATCH v2] drm/amdgpu: skip xcp drm device allocation when out of drm resource
On 2023-08-11 16:23, James Zhu wrote: Return 0 when drm device alloc failed with -ENOSPC in order to allow amdgpu drive loading. But the xcp without drm device node assigned won't be visiable in user space. This helps amdgpu driver loading on system which has more than 64 nodes, the current limitation. The proposal to add more drm nodes is discussed in public, which will support up to 2^20 nodes totally. kernel drm: https://lore.kernel.org/lkml/20230724211428.3831636-1-michal.winiar...@intel.com/T/ libdrm: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305 Signed-off-by: James Zhu Acked-by: Christian König -v2: added warning message --- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 13 - drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 +- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index 9c9cca129498..f0754d70da5c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -239,8 +239,13 @@ static int amdgpu_xcp_dev_alloc(struct amdgpu_device *adev) for (i = 1; i < MAX_XCP; i++) { ret = amdgpu_xcp_drm_dev_alloc(_ddev); - if (ret) + if (ret == -ENOSPC) { + dev_WARN(adev->dev, + "Skip xcp node #%d when out of drm node resource.", i); This prints a noisy backtrace. Maybe that's a bit too much. I'd just use dev_warn, so it only prints your message without a backtrace. + return 0; + } else if (ret) { return ret; + } /* Redirect all IOCTLs to the primary device */ adev->xcp_mgr->xcp[i].rdev = p_ddev->render->dev; @@ -328,6 +333,9 @@ int amdgpu_xcp_dev_register(struct amdgpu_device *adev, return 0; for (i = 1; i < MAX_XCP; i++) { + if (!adev->xcp_mgr->xcp[i].ddev) + break; + ret = drm_dev_register(adev->xcp_mgr->xcp[i].ddev, ent->driver_data); if (ret) return ret; @@ -345,6 +353,9 @@ void amdgpu_xcp_dev_unplug(struct amdgpu_device *adev) return; for (i = 1; i < MAX_XCP; i++) { + if (!adev->xcp_mgr->xcp[i].ddev) + break; + p_ddev = adev->xcp_mgr->xcp[i].ddev; drm_dev_unplug(p_ddev); p_ddev->render->dev = adev->xcp_mgr->xcp[i].rdev; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 3b0749390388..0f844151caaf 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1969,8 +1969,16 @@ int kfd_topology_add_device(struct kfd_node *gpu) int i; const char *asic_name = amdgpu_asic_name[gpu->adev->asic_type]; + gpu_id = kfd_generate_gpu_id(gpu); - pr_debug("Adding new GPU (ID: 0x%x) to topology\n", gpu_id); + if (!gpu->xcp->ddev) { + dev_WARN(gpu->adev->dev, + "Won't add GPU (ID: 0x%x) to topology since it has no drm node assigned.", + gpu_id); Same as above. Regards, Felix + return 0; + } else { + pr_debug("Adding new GPU (ID: 0x%x) to topology\n", gpu_id); + } /* Check to see if this gpu device exists in the topology_device_list. * If so, assign the gpu to that device,
Re: [PATCH] drm/amdkfd: add schedule to remove RCU stall on CPU
On 2023-08-11 15:11, James Zhu wrote: update_list could be big in list_for_each_entry(prange, _list, update_list), mmap_read_lock(mm) is kept hold all the time, adding schedule() can remove RCU stall on CPU for this case. RIP: 0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu] You're just showing the backtrace here, but not what the problem is. Can you include more context, e.g. the message that says something about a stall? Code: 00 00 00 bf 00 02 00 00 48 81 c2 90 00 00 00 e8 1f 6a b9 e0 65 48 8b 14 25 00 bd 01 00 8b 42 2c 48 8b 3c 24 80 e4 f7 0b 43 d8 <89> 42 2c e8 51 dd 2d e1 48 8b 7b 38 e8 98 29 b7 e0 48 83 c4 30 b8 RSP: 0018:c9000ffd7b10 EFLAGS: 0206 RAX: 0100 RBX: 88c493968d80 RCX: 88d1a6469b18 RDX: 88e18ef1ec80 RSI: c9000ffd7be0 RDI: 88c493968d38 RBP: 0003062e R08: 3042f000 R09: 3062efff R10: 1000 R11: 88c1ad255000 R12: 0003042f R13: 88c493968c00 R14: c9000ffd7be0 R15: 88c493968c00 __mmu_notifier_invalidate_range_start+0x132/0x1d0 ? amdgpu_vm_bo_update+0x3fd/0x520 [amdgpu] migrate_vma_setup+0x6c7/0x8f0 ? kfd_smi_event_migration_start+0x5f/0x80 [amdgpu] svm_migrate_ram_to_vram+0x14e/0x580 [amdgpu] svm_range_set_attr+0xe34/0x11a0 [amdgpu] kfd_ioctl+0x271/0x4e0 [amdgpu] ? kfd_ioctl_set_xnack_mode+0xd0/0xd0 [amdgpu] __x64_sys_ioctl+0x92/0xd0 Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 113fd11aa96e..9f2d48ade7fa 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -3573,6 +3573,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, r = svm_range_trigger_migration(mm, prange, ); if (r) goto out_unlock_range; + schedule(); I'm not sure that unconditionally scheduling here in every loop iteration is a good solution. This could lead to performance degradation when there are many small ranges. I think a better option is to call cond_resched. That would only reschedule only "if necessary", though I haven't quite figured out the criteria for rescheduling being necessary. Regards, Felix if (migrated && (!p->xnack_enabled || (prange->flags & KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&
Re: [PATCH] drm/amdgpu: don't allow userspace to create a doorbell BO
Am 2023-08-09 um 15:09 schrieb Alex Deucher: We need the domains in amdgpu_drm.h for the kernel driver to manage the pool, but we don't want userspace using it until the code is ready. So reject for now. Signed-off-by: Alex Deucher Acked-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 693b1fd1191a..ca4d2d430e28 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -289,6 +289,10 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data, uint32_t handle, initial_domain; int r; + /* reject DOORBELLs until userspace code to use it is available */ + if (args->in.domains & AMDGPU_GEM_DOMAIN_DOORBELL) + return -EINVAL; + /* reject invalid gem flags */ if (flags & ~(AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED | AMDGPU_GEM_CREATE_NO_CPU_ACCESS |
Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
Am 2023-08-10 um 18:27 schrieb Eric Huang: There is not UNMAP_QUEUES command sending for queue preemption because the queue is suspended and test is closed to the end. Function unmap_queue_cpsch will do nothing after that. How do you suspend queues without sending an UNMAP_QUEUES command? Regards, Felix The workaround is new and only for gfx v9.4.2, because debugger tests has changed to check if all address watch points are correctly set, i.e. test A sets more than one watchpoint and leave, the following test B only sets one watchpoint, and test A's setting will cause more than one watchpoint event, so test B check out and report error on second or third watchpoint not set by itself. Regards, Eric On 2023-08-10 17:56, Felix Kuehling wrote: I think Jon is suggesting that the UNMAP_QUEUES command should clear the address watch registers. Requesting such a change from the the HWS team may take a long time. That said, when was this workaround implemented and reviewed? Did I review it as part of Jon's debugger upstreaming patch series? Or did this come later? This patch only enables the workaround for v9.4.2. Regards, Felix On 2023-08-10 17:52, Eric Huang wrote: The problem is the queue is suspended before clearing address watch call in KFD, there is not queue preemption and queue resume after clearing call, and the test ends. So there is not chance to send MAP_PROCESS to HWS. At this point FW has nothing to do. We have several test FWs from Tej, none of them works, so I recalled the kernel debug log and found out the problem. GFX11 has different scheduler, when calling clear address watch, KFD directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it doesn't consider if the queue is suspended. So GFX11 doesn't have this issue. Regards, Eric On 2023-08-10 17:27, Kim, Jonathan wrote: [AMD Official Use Only - General] This is a strange solution because the MEC should set watch controls as non-valid automatically on queue preemption to avoid this kind of issue in the first place by design. MAP_PROCESS on resume will take whatever the driver requests. GFX11 has no issue with letting the HWS do this. Are we sure we're not working around some HWS bug? Thanks, Jon -Original Message- From: Kuehling, Felix Sent: Thursday, August 10, 2023 5:03 PM To: Huang, JinHuiEric ; amd- g...@lists.freedesktop.org Cc: Kim, Jonathan Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2 I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit different because it needs to support multiple XCCs. That said, this patch is Reviewed-by: Felix Kuehling On 2023-08-10 16:47, Eric Huang wrote: KFD currently relies on MEC FW to clear tcp watch control register by sending MAP_PROCESS packet with 0 of field tcp_watch_cntl to HWS, but if the queue is suspended, the packet will not be sent and the previous value will be left on the register, that will affect the following apps. So the solution is to clear the register as gfx v9 in KFD. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index e2fed6edbdd0..aff08321e976 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) -{ - return 0; -} - const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .set_wave_launch_trap_override = kgd_aldebaran_set_wave_launch_trap_override, .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode, .set_address_watch = kgd_gfx_aldebaran_set_address_watch, - .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch, + .clear_address_watch = kgd_gfx_v9_clear_address_watch, .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings,
Re: [PATCH] drm/amdkfd: avoid svm dump when dynamic debug disabled
Am 2023-08-11 um 06:11 schrieb Mike Lothian: On Thu, 3 Aug 2023 at 20:43, Felix Kuehling wrote: Is your kernel configured without dynamic debugging? Maybe we need to wrap this in some #if defined(CONFIG_DYNAMIC_DEBUG_CORE). Apologies, I thought I'd replied to this, yes I didn't have dynamic debugging enabled I submitted a fix for this by Arnd Bergman: https://patchwork.freedesktop.org/patch/551367/. It should show up in Alex's public branch soon. Regards, Felix
Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
I think Jon is suggesting that the UNMAP_QUEUES command should clear the address watch registers. Requesting such a change from the the HWS team may take a long time. That said, when was this workaround implemented and reviewed? Did I review it as part of Jon's debugger upstreaming patch series? Or did this come later? This patch only enables the workaround for v9.4.2. Regards, Felix On 2023-08-10 17:52, Eric Huang wrote: The problem is the queue is suspended before clearing address watch call in KFD, there is not queue preemption and queue resume after clearing call, and the test ends. So there is not chance to send MAP_PROCESS to HWS. At this point FW has nothing to do. We have several test FWs from Tej, none of them works, so I recalled the kernel debug log and found out the problem. GFX11 has different scheduler, when calling clear address watch, KFD directly sends the MES_MISC_OP_SET_SHADER_DEBUGGER to MES, it doesn't consider if the queue is suspended. So GFX11 doesn't have this issue. Regards, Eric On 2023-08-10 17:27, Kim, Jonathan wrote: [AMD Official Use Only - General] This is a strange solution because the MEC should set watch controls as non-valid automatically on queue preemption to avoid this kind of issue in the first place by design. MAP_PROCESS on resume will take whatever the driver requests. GFX11 has no issue with letting the HWS do this. Are we sure we're not working around some HWS bug? Thanks, Jon -Original Message- From: Kuehling, Felix Sent: Thursday, August 10, 2023 5:03 PM To: Huang, JinHuiEric ; amd- g...@lists.freedesktop.org Cc: Kim, Jonathan Subject: Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2 I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit different because it needs to support multiple XCCs. That said, this patch is Reviewed-by: Felix Kuehling On 2023-08-10 16:47, Eric Huang wrote: KFD currently relies on MEC FW to clear tcp watch control register by sending MAP_PROCESS packet with 0 of field tcp_watch_cntl to HWS, but if the queue is suspended, the packet will not be sent and the previous value will be left on the register, that will affect the following apps. So the solution is to clear the register as gfx v9 in KFD. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index e2fed6edbdd0..aff08321e976 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) -{ - return 0; -} - const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .set_wave_launch_trap_override = kgd_aldebaran_set_wave_launch_trap_override, .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode, .set_address_watch = kgd_gfx_aldebaran_set_address_watch, - .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch, + .clear_address_watch = kgd_gfx_v9_clear_address_watch, .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings,
Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2
I think amdgpu_amdkfd_gc_9_4_3.c needs a similar fix. But maybe a bit different because it needs to support multiple XCCs. That said, this patch is Reviewed-by: Felix Kuehling On 2023-08-10 16:47, Eric Huang wrote: KFD currently relies on MEC FW to clear tcp watch control register by sending MAP_PROCESS packet with 0 of field tcp_watch_cntl to HWS, but if the queue is suspended, the packet will not be sent and the previous value will be left on the register, that will affect the following apps. So the solution is to clear the register as gfx v9 in KFD. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index e2fed6edbdd0..aff08321e976 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -163,12 +163,6 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( return watch_address_cntl; } -static uint32_t kgd_gfx_aldebaran_clear_address_watch(struct amdgpu_device *adev, - uint32_t watch_id) -{ - return 0; -} - const struct kfd2kgd_calls aldebaran_kfd2kgd = { .program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings, .set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping, @@ -193,7 +187,7 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = { .set_wave_launch_trap_override = kgd_aldebaran_set_wave_launch_trap_override, .set_wave_launch_mode = kgd_aldebaran_set_wave_launch_mode, .set_address_watch = kgd_gfx_aldebaran_set_address_watch, - .clear_address_watch = kgd_gfx_aldebaran_clear_address_watch, + .clear_address_watch = kgd_gfx_v9_clear_address_watch, .get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times, .build_grace_period_packet_info = kgd_gfx_v9_build_grace_period_packet_info, .program_trap_handler_settings = kgd_gfx_v9_program_trap_handler_settings,
Re: [PATCH] drm/amdkfd: fix double assign skip process context clear
On 2023-08-10 15:03, Jonathan Kim wrote: Remove redundant assignment when skipping process ctx clear. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index aa5091f18681..89c2bfcb36ce 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -227,7 +227,6 @@ static int add_queue_mes(struct device_queue_manager *dqm, struct queue *q, queue_input.tba_addr = qpd->tba_addr; queue_input.tma_addr = qpd->tma_addr; queue_input.trap_en = !kfd_dbg_has_cwsr_workaround(q->device); - queue_input.skip_process_ctx_clear = qpd->pqm->process->debug_trap_enabled; queue_input.skip_process_ctx_clear = qpd->pqm->process->debug_trap_enabled || kfd_dbg_has_ttmps_always_setup(q->device);
Re: [PATCH] drm/amdkfd: Add missing tba_hi programming on aldebaran
On 2023-08-09 17:26, Jay Cornwall wrote: Previously asymptomatic because high 32 bits were zero. Fixes: 615222cfed20 ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole") Signed-off-by: Jay Cornwall Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c index 8fda16e6fee6..8ce6f5200905 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c @@ -121,6 +121,7 @@ static int pm_map_process_aldebaran(struct packet_manager *pm, packet->sh_mem_bases = qpd->sh_mem_bases; if (qpd->tba_addr) { packet->sq_shader_tba_lo = lower_32_bits(qpd->tba_addr >> 8); + packet->sq_shader_tba_hi = upper_32_bits(qpd->tba_addr >> 8); packet->sq_shader_tma_lo = lower_32_bits(qpd->tma_addr >> 8); packet->sq_shader_tma_hi = upper_32_bits(qpd->tma_addr >> 8); }
Re: [PATCH v2] drm/amdkfd: Use memdup_user() rather than duplicating its implementation
On 2023-08-09 01:30, Atul Raut wrote: To prevent its redundant implementation and streamline code, use memdup_user. This fixes warnings reported by Coccinelle: ./drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:2811:13-20: WARNING opportunity for memdup_user Signed-off-by: Atul Raut The patch is Reviewed-by: Felix Kuehling I'm applying it to amd-staging-drm-next. Regards, Felix --- v1 -> v2 caller checks for errors, hence removed --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 2df153828ff4..df9b618756e6 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -2803,19 +2803,11 @@ static void copy_context_work_handler (struct work_struct *work) static uint32_t *get_queue_ids(uint32_t num_queues, uint32_t *usr_queue_id_array) { size_t array_size = num_queues * sizeof(uint32_t); - uint32_t *queue_ids = NULL; if (!usr_queue_id_array) return NULL; - queue_ids = kzalloc(array_size, GFP_KERNEL); - if (!queue_ids) - return ERR_PTR(-ENOMEM); - - if (copy_from_user(queue_ids, usr_queue_id_array, array_size)) - return ERR_PTR(-EFAULT); - - return queue_ids; + return memdup_user(usr_queue_id_array, array_size); } int resume_queues(struct kfd_process *p,
Re: drm/amdkfd: Use memdup_user() rather than duplicating its
On 2023-08-08 16:57, Atul Raut wrote: To prevent its redundant implementation and streamline code, use memdup_user. This fixes warnings reported by Coccinelle: ./drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:2811:13-20: WARNING opportunity for memdup_user Signed-off-by: Atul Raut --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 2df153828ff4..51740e007e89 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -2808,12 +2808,9 @@ static uint32_t *get_queue_ids(uint32_t num_queues, uint32_t *usr_queue_id_array if (!usr_queue_id_array) return NULL; - queue_ids = kzalloc(array_size, GFP_KERNEL); - if (!queue_ids) - return ERR_PTR(-ENOMEM); - - if (copy_from_user(queue_ids, usr_queue_id_array, array_size)) - return ERR_PTR(-EFAULT); + queue_ids = memdup_user(usr_queue_id_array, array_size); + if (IS_ERR(Iqueue_ids)) You have a typo in the variable name here. Did you at least compile-test the patch? + return ERR_PTR(queue_ids); I think it should just return queue_ids here. That's already an ERR_PTR in case of errors. So you don't even need the "if". Just this should do the job: return memdup_user(usr_queue_id_array, array_size); The error checking is done by the caller. Regards, Felix return queue_ids; }
Re: [PATCH V2 1/5] drm/amdkfd: ignore crat by default
On 2023-08-07 18:05, Alex Deucher wrote: We are dropping the IOMMUv2 path, so no need to enable this. It's often buggy on consumer platforms anyway. Signed-off-by: Alex Deucher The series is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index 49f40d9f16e86..f5a6f562e2a80 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -1543,11 +1543,7 @@ static bool kfd_ignore_crat(void) if (ignore_crat) return true; -#ifndef KFD_SUPPORT_IOMMU_V2 ret = true; -#else - ret = false; -#endif return ret; }
Re: [PATCH] drm/amdkfd: wrap dynamic debug call with CONFIG_DYNAMIC_DEBUG_CORE
I just applied Arnd Bergmann's patch "drm/amdkfd: fix build failure without CONFIG_DYNAMIC_DEBUG". This patch is no longer needed. Regards, Felix On 2023-08-04 12:05, Alex Sierra wrote: This causes error compilation if CONFIG_DYNAMIC_DEBUG_CORE is not defined. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index a69994ff1c2f..cde4cc6afa83 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -824,6 +824,7 @@ svm_range_is_same_attrs(struct kfd_process *p, struct svm_range *prange, * * Context: The caller must hold svms->lock */ +#if defined(CONFIG_DYNAMIC_DEBUG_CORE) static void svm_range_debug_dump(struct svm_range_list *svms) { struct interval_tree_node *node; @@ -851,6 +852,7 @@ static void svm_range_debug_dump(struct svm_range_list *svms) node = interval_tree_iter_next(node, 0, ~0ULL); } } +#endif static void * svm_range_copy_array(void *psrc, size_t size, uint64_t num_elements, @@ -3594,7 +3596,9 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, break; } +#if defined(CONFIG_DYNAMIC_DEBUG_CORE) dynamic_svm_range_dump(svms); +#endif mutex_unlock(>lock); mmap_read_unlock(mm);
Re: [PATCH] drm/amdkfd: fix build failure without CONFIG_DYNAMIC_DEBUG
On 2023-08-04 9:29, Arnd Bergmann wrote: From: Arnd Bergmann When CONFIG_DYNAMIC_DEBUG is disabled altogether, calling _dynamic_func_call_no_desc() does not work: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c: In function 'svm_range_set_attr': drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:52:9: error: implicit declaration of function '_dynamic_func_call_no_desc' [-Werror=implicit-function-declaration] 52 | _dynamic_func_call_no_desc("svm_range_dump", svm_range_debug_dump, svms) | ^~ drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:3564:9: note: in expansion of macro 'dynamic_svm_range_dump' 3564 | dynamic_svm_range_dump(svms); | ^~ Add a compile-time conditional in addition to the runtime check. Fixes: 8923137dbe4b2 ("drm/amdkfd: avoid svm dump when dynamic debug disabled") Signed-off-by: Arnd Bergmann The patch is Reviewed-by: Felix Kuehling I'm applying it to amd-staging-drm-next. Thanks, Felix --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 308384dbc502d..44e710821b6d9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -23,6 +23,7 @@ #include #include +#include #include #include @@ -48,8 +49,13 @@ * page table is updated. */ #define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING (2UL * NSEC_PER_MSEC) +#if IS_ENABLED(CONFIG_DYNAMIC_DEBUG) #define dynamic_svm_range_dump(svms) \ _dynamic_func_call_no_desc("svm_range_dump", svm_range_debug_dump, svms) +#else +#define dynamic_svm_range_dump(svms) \ + do { if (0) svm_range_debug_dump(svms); } while (0) +#endif /* Giant svm range split into smaller ranges based on this, it is decided using * minimum of all dGPU/APU 1/32 VRAM size, between 2MB to 1GB and alignment to
Re: [PATCH] drm/amdkfd: avoid svm dump when dynamic debug disabled
Is your kernel configured without dynamic debugging? Maybe we need to wrap this in some #if defined(CONFIG_DYNAMIC_DEBUG_CORE). Regards, Felix Am 2023-08-03 um 15:38 schrieb Mike Lothian: Hi I'm seeing a compiler failure with Clang 16 CC drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.o drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:3568:2: error: call to undeclared function '_dynamic_func_call_no_desc'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] dynamic_svm_range_dump(svms); ^ drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:50:2: note: expanded from macro 'dynamic_svm_range_dump' _dynamic_func_call_no_desc("svm_range_dump", svm_range_debug_dump, svms) ^ 1 error generated. Cheers Mike On Wed, 19 Jul 2023 at 22:27, Felix Kuehling wrote: Am 2023-07-19 um 17:22 schrieb Alex Sierra: Set dynamic_svm_range_dump macro to avoid iterating over SVM lists from svm_range_debug_dump when dynamic debug is disabled. Otherwise, it could drop performance, specially with big number of SVM ranges. Make sure both svm_range_set_attr and svm_range_debug_dump functions are dynamically enabled to print svm_range_debug_dump debug traces. Signed-off-by: Alex Sierra Tested-by: Alex Sierra Signed-off-by: Philip Yang Signed-off-by: Felix Kuehling I don't think my name on a Signed-off-by is appropriate here. I didn't write the patch. And I'm not submitting it. However, the patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 479c4f66afa7..1b50eae051a4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -46,6 +46,8 @@ * page table is updated. */ #define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING(2UL * NSEC_PER_MSEC) +#define dynamic_svm_range_dump(svms) \ + _dynamic_func_call_no_desc("svm_range_dump", svm_range_debug_dump, svms) /* Giant svm range split into smaller ranges based on this, it is decided using * minimum of all dGPU/APU 1/32 VRAM size, between 2MB to 1GB and alignment to @@ -3563,7 +3565,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, break; } - svm_range_debug_dump(svms); + dynamic_svm_range_dump(svms); mutex_unlock(>lock); mmap_read_unlock(mm);
Re: [PATCH 1/3] drm/amdkfd: Sync trap handler binaries with source
On 2023-07-31 16:40, Jay Cornwall wrote: Some changes have been lost during rebases. Rebuild sources. Signed-off-by: Jay Cornwall The series is Reviewed-by: Felix Kuehling --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 741 +- 1 file changed, 371 insertions(+), 370 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h index 73ca9aebf086..717ad0633dbe 100644 --- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h +++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h @@ -283,7 +283,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = { 0x866eff7b, 0x0400, 0xbf850051, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, - 0x866eff7b, 0x0900, + 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, 0x71ff, 0xbf840008, 0x866fff7b, 0x7080, @@ -1103,7 +1103,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = { 0x866eff7b, 0x0400, 0xbf850051, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, - 0x866eff7b, 0x0900, + 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, 0x71ff, 0xbf840008, 0x866fff7b, 0x7080, @@ -1581,7 +1581,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = { 0x866eff7b, 0x0400, 0xbf850051, 0xbf8e0010, 0xb8fbf803, 0xbf82fffa, - 0x866eff7b, 0x0900, + 0x866eff7b, 0x03c00900, 0xbf850015, 0x866eff7b, 0x71ff, 0xbf840008, 0x866fff7b, 0x7080, @@ -2494,6 +2494,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = { 0xbf9f, 0xbf9f, 0xbf9f, 0x, }; + static const uint32_t cwsr_trap_gfx11_hex[] = { 0xbfa1, 0xbfa00221, 0xb0804006, 0xb8f8f802, @@ -2938,211 +2939,149 @@ static const uint32_t cwsr_trap_gfx11_hex[] = { }; static const uint32_t cwsr_trap_gfx9_4_3_hex[] = { - 0xbf820001, 0xbf8202d6, - 0xb8f8f802, 0x89788678, - 0xb8fbf803, 0x866eff78, - 0x2000, 0xbf840009, - 0x866eff6d, 0x00ff, - 0xbf85001a, 0x866eff7b, - 0x0400, 0xbf85004d, - 0xbf8e0010, 0xb8fbf803, - 0xbf82fffa, 0x866eff7b, - 0x03c00900, 0xbf850011, - 0x866eff7b, 0x71ff, - 0xbf840008, 0x866fff7b, - 0x7080, 0xbf840001, - 0xbeee1a87, 0xb8eff801, - 0x8e6e8c6e, 0x866e6f6e, - 0xbf850006, 0x866eff6d, - 0x00ff, 0xbf850003, + 0xbf820001, 0xbf8202d7, + 0xb8f8f802, 0x8978ff78, + 0x00020006, 0xb8fbf803, + 0x866eff78, 0x2000, + 0xbf840009, 0x866eff6d, + 0x00ff, 0xbf85001a, 0x866eff7b, 0x0400, - 0xbf850036, 0xb8faf807, + 0xbf85004d, 0xbf8e0010, + 0xb8fbf803, 0xbf82fffa, + 0x866eff7b, 0x03c00900, + 0xbf850011, 0x866eff7b, + 0x71ff, 0xbf840008, + 0x866fff7b, 0x7080, + 0xbf840001, 0xbeee1a87, + 0xb8eff801, 0x8e6e8c6e, + 0x866e6f6e, 0xbf850006, + 0x866eff6d, 0x00ff, + 0xbf850003, 0x866eff7b, + 0x0400, 0xbf850036, + 0xb8faf807, 0x867aff7a, + 0x001f8000, 0x8e7a8b7a, + 0x8979ff79, 0xfc00, + 0x87797a79, 0xba7ff807, + 0x, 0xb8faf812, + 0xb8fbf813, 0x8efa887a, + 0xc0031bbd, 0x0010, + 0xbf8cc07f, 0x8e6e976e, + 0x8979ff79, 0x0080, + 0x87796e79, 0xc0071bbd, + 0x, 0xbf8cc07f, + 0xc0071ebd, 0x0008, + 0xbf8cc07f, 0x86ee6e6e, + 0xbf840001, 0xbe801d6e, + 0x866eff6d, 0x01ff, + 0xbf850005, 0x8778ff78, + 0x2000, 0x80ec886c, + 0x82ed806d, 0xbf820005, + 0x866eff6d, 0x0100, + 0xbf850002, 0x806c846c, + 0x826d806d, 0x866dff6d, + 0x, 0x8f7a8b79, 0x867aff7a, 0x001f8000, - 0x8e7a8b7a, 0x8979ff79, - 0xfc00, 0x87797a79, - 0xba7ff807, 0x, - 0xb8faf812, 0xb8fbf813, - 0x8efa887a, 0xc0031bbd, - 0x0010, 0xbf8cc07f, - 0x8e6e976e, 0x8979ff79, - 0x0080, 0x87796e79, - 0xc0071bbd, 0x, - 0xbf8cc07f, 0xc0071ebd, - 0x0008, 0xbf8cc07f, - 0x86ee6e6e, 0xbf840001, - 0xbe801d6e, 0x866eff6d, - 0x01ff, 0xbf850005, - 0x8778ff78, 0x2000, - 0x80ec886c, 0x82ed806d, - 0xbf820005, 0x866eff6d, - 0x0100, 0xbf850002, - 0x806c846c, 0x826d806d, + 0xb97af807, 0x86fe7e7e, + 0x86ea6a6a, 0x8f6e8378, + 0xb96ee0c2, 0xbf82, + 0xb9780002, 0xbe801f6c, 0x866dff6d, 0x, - 0x8f7a8b79, 0x867aff7a, - 0x001f8000, 0xb97af807, - 0x86fe7e7e, 0x86ea6a6a, - 0x8f6e8378, 0xb96ee0c2, - 0xbf82, 0xb9780002, - 0xbe801f6c, 0x866dff6d, - 0x, 0xbefa0080, - 0xb97a0283, 0xb8faf807, - 0x867aff7a, 0x001f8000, - 0x8e7a8b7a, 0x8979ff79, - 0xfc00, 0x87797a79
Re: [PATCH] drm/amdkfd: avoid unmap dma address when svm_ranges are split
On 2023-07-28 17:41, Alex Sierra wrote: DMA address reference within svm_ranges should be unmapped only after the memory has been released from the system. In case of range splitting, the DMA address information should be copied to the corresponding range after this has split. But leaving dma mapping intact. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 7 +-- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 61 +--- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 2 +- 3 files changed, 50 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 709ac885ca6d..7d82c7da223a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -461,7 +461,6 @@ svm_migrate_vma_to_vram(struct kfd_node *node, struct svm_range *prange, 0, node->id, trigger); svm_range_dma_unmap(adev->dev, scratch, 0, npages); - svm_range_free_dma_mappings(prange); out_free: kvfree(buf); @@ -543,10 +542,12 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc, addr = next; } - if (cpages) + if (cpages) { prange->actual_loc = best_loc; - else + svm_range_free_dma_mappings(prange, true); + } else { svm_range_vram_node_free(prange); + } return r < 0 ? r : 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 1b50eae051a4..a69994ff1c2f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -241,7 +241,7 @@ void svm_range_dma_unmap(struct device *dev, dma_addr_t *dma_addr, } } -void svm_range_free_dma_mappings(struct svm_range *prange) +void svm_range_free_dma_mappings(struct svm_range *prange, bool unmap_dma) { struct kfd_process_device *pdd; dma_addr_t *dma_addr; @@ -262,13 +262,14 @@ void svm_range_free_dma_mappings(struct svm_range *prange) continue; } dev = >dev->adev->pdev->dev; - svm_range_dma_unmap(dev, dma_addr, 0, prange->npages); + if (unmap_dma) + svm_range_dma_unmap(dev, dma_addr, 0, prange->npages); kvfree(dma_addr); prange->dma_addr[gpuidx] = NULL; } } -static void svm_range_free(struct svm_range *prange, bool update_mem_usage) +static void svm_range_free(struct svm_range *prange, bool do_unmap) { uint64_t size = (prange->last - prange->start + 1) << PAGE_SHIFT; struct kfd_process *p = container_of(prange->svms, struct kfd_process, svms); @@ -277,9 +278,9 @@ static void svm_range_free(struct svm_range *prange, bool update_mem_usage) prange->start, prange->last); svm_range_vram_node_free(prange); - svm_range_free_dma_mappings(prange); + svm_range_free_dma_mappings(prange, do_unmap); - if (update_mem_usage && !p->xnack_enabled) { + if (do_unmap && !p->xnack_enabled) { pr_debug("unreserve prange 0x%p size: 0x%llx\n", prange, size); amdgpu_amdkfd_unreserve_mem_limit(NULL, size, KFD_IOC_ALLOC_MEM_FLAGS_USERPTR, 0); @@ -851,6 +852,37 @@ static void svm_range_debug_dump(struct svm_range_list *svms) } } +static void * +svm_range_copy_array(void *psrc, size_t size, uint64_t num_elements, +uint64_t offset) +{ + unsigned char *dst; + + dst = kvmalloc_array(num_elements, size, GFP_KERNEL); + if (!dst) + return NULL; + memcpy(dst, (unsigned char *)psrc + offset, num_elements * size); + + return (void *)dst; +} + +static int +svm_range_copy_dma_addrs(struct svm_range *dst, struct svm_range *src) +{ + int i; + + for (i = 0; i < MAX_GPU_INSTANCE; i++) { + if (!src->dma_addr[i]) + continue; + dst->dma_addr[i] = svm_range_copy_array(src->dma_addr[i], + sizeof(*src->dma_addr[i]), src->npages, 0); + if (!dst->dma_addr[i]) + return -ENOMEM; + } + + return 0; +} + static int svm_range_split_array(void *ppnew, void *ppold, size_t size, uint64_t old_start, uint64_t old_n, @@ -865,22 +897,16 @@ svm_range_split_array(void *ppnew, void *ppold, size_t size, if (!pold) return 0; - new = kvmalloc_array(new_n, size, GFP_KERNEL); + d = (new_start - old_start) * size; + new = svm_range_copy_array(pold, size, new_n, d); if (!new) return -ENOME
Re: [PATCH v3] drm/amdgpu: Add EXT_COHERENT memory allocation flags
On 2023-07-28 15:39, David Francis wrote: These flags (for GEM and SVM allocations) allocate memory that allows for system-scope atomic semantics. On GFX943 these flags cause caches to be avoided on non-local memory. On all other ASICs they are identical in functionality to the equivalent COHERENT flags. Corresponding Thunk patch is at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88 v3: changed name of flag Signed-off-by: David Francis I made one comment on the user mode patch regarding the explicit handling of invalid combinations of Uncached, Coherent, ExtCoherent flags. I'm not sure what we agreed on any more. But I don't think we want to just leave it up to chance. Other than that, this patch looks good to me. Regards, Felix --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 5 - drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +- include/uapi/drm/amdgpu_drm.h| 10 +- include/uapi/linux/kfd_ioctl.h | 3 +++ 8 files changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index d34c3ef8f3ed..a1ce261f2d06 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1738,6 +1738,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( if (flags & KFD_IOC_ALLOC_MEM_FLAGS_COHERENT) alloc_flags |= AMDGPU_GEM_CREATE_COHERENT; + if (flags & KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT) + alloc_flags |= AMDGPU_GEM_CREATE_EXT_COHERENT; if (flags & KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED) alloc_flags |= AMDGPU_GEM_CREATE_UNCACHED; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 12210598e5b8..76b618735dc0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -331,6 +331,7 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, struct dma_buf *dma_buf) flags |= other->flags & (AMDGPU_GEM_CREATE_CPU_GTT_USWC | AMDGPU_GEM_CREATE_COHERENT | +AMDGPU_GEM_CREATE_EXT_COHERENT | AMDGPU_GEM_CREATE_UNCACHED); } diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c index 6b430e10d38e..301ffe30824f 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c @@ -632,6 +632,7 @@ static void gmc_v10_0_get_vm_pte(struct amdgpu_device *adev, } if (bo && bo->flags & (AMDGPU_GEM_CREATE_COHERENT | + AMDGPU_GEM_CREATE_EXT_COHERENT | AMDGPU_GEM_CREATE_UNCACHED)) *flags = (*flags & ~AMDGPU_PTE_MTYPE_NV10_MASK) | AMDGPU_PTE_MTYPE_NV10(MTYPE_UC); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c index a6ee0220db56..846894e212e7 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c @@ -540,6 +540,7 @@ static void gmc_v11_0_get_vm_pte(struct amdgpu_device *adev, } if (bo && bo->flags & (AMDGPU_GEM_CREATE_COHERENT | + AMDGPU_GEM_CREATE_EXT_COHERENT | AMDGPU_GEM_CREATE_UNCACHED)) *flags = (*flags & ~AMDGPU_PTE_MTYPE_NV10_MASK) | AMDGPU_PTE_MTYPE_NV10(MTYPE_UC); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 880460cd3239..92a623e130d9 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1183,7 +1183,8 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, { struct amdgpu_device *bo_adev = amdgpu_ttm_adev(bo->tbo.bdev); bool is_vram = bo->tbo.resource->mem_type == TTM_PL_VRAM; - bool coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT; + bool coherent = bo->flags & (AMDGPU_GEM_CREATE_COHERENT | AMDGPU_GEM_CREATE_EXT_COHERENT); + bool ext_coherent = bo->flags & AMDGPU_GEM_CREATE_EXT_COHERENT; bool uncached = bo->flags & AMDGPU_GEM_CREATE_UNCACHED; struct amdgpu_vm *vm = mapping->bo_va->base.vm; unsigned int mtype_local, mtype; @@ -1251,6 +1252,8 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, snoop = true; if (uncached) { mtype = MTYPE_UC; + } else if (ext_coherent) { +
Re: [PATCH 2/4] drm/amdkfd: disable IOMMUv2 support for KV/CZ
There are some APU-specific code paths for Kaveri and Carrizo in the device queue manager and MQD manager. I think a minimal fix would be to change device_queue_manager_init to call device_queue_manager_init_cik_hawaii for Kaveri and device_queue_manager_init_vi_tonga for Carrizo to use the dGPU code paths. Then we could probably remove the APU-specific functions and remove the _hawaii and _tonga suffixes from the dGPU functions. Regards, Felix On 2023-07-28 12:41, Alex Deucher wrote: Use the dGPU path instead. There were a lot of platform issues with IOMMU in general on these chips due to windows not enabling IOMMU at the time. The dGPU path has been used for a long time with newer APUs and works fine. This also paves the way to simplify the driver significantly. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 64772921ea43b..814a6116ca9bb 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -234,10 +234,6 @@ static void kfd_device_info_init(struct kfd_dev *kfd, asic_type != CHIP_TONGA) kfd->device_info.supports_cwsr = true; - if (asic_type == CHIP_KAVERI || - asic_type == CHIP_CARRIZO) - kfd->device_info.needs_iommu_device = true; - if (asic_type != CHIP_HAWAII && !vf) kfd->device_info.needs_pci_atomics = true; } @@ -250,7 +246,6 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device *adev, bool vf) uint32_t gfx_target_version = 0; switch (adev->asic_type) { -#ifdef KFD_SUPPORT_IOMMU_V2 #ifdef CONFIG_DRM_AMDGPU_CIK case CHIP_KAVERI: gfx_target_version = 7; @@ -263,7 +258,6 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device *adev, bool vf) if (!vf) f2g = _v8_kfd2kgd; break; -#endif #ifdef CONFIG_DRM_AMDGPU_CIK case CHIP_HAWAII: gfx_target_version = 70001;
Re: [PATCH] drm/amdkfd: avoid unmap dma address when svm_ranges are split
On 2023-07-27 19:43, Alex Sierra wrote: DMA address reference within svm_ranges should be unmapped only after the memory has been released from the system. In case of range splitting, the DMA address information should be copied to the corresponding range after this has split. But leaving dma mapping intact. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 67 ++-- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 2 +- 3 files changed, 52 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 709ac885ca6d..2586ac070190 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -461,7 +461,7 @@ svm_migrate_vma_to_vram(struct kfd_node *node, struct svm_range *prange, 0, node->id, trigger); svm_range_dma_unmap(adev->dev, scratch, 0, npages); - svm_range_free_dma_mappings(prange); + svm_range_free_dma_mappings(prange, true); Do we even need to call svm_range_dma_unmap just before? Looks like that's done inside svm_range_free_dma_mappings anyway. Maybe this should also be moved to svm_migrate_ram_to_vram because it affects the entire prange and not just one VMA. So you only need to do it once per prange. Let's clean that up in a follow up change. out_free: kvfree(buf); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 1b50eae051a4..d1ff1c7e96d0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -241,7 +241,7 @@ void svm_range_dma_unmap(struct device *dev, dma_addr_t *dma_addr, } } -void svm_range_free_dma_mappings(struct svm_range *prange) +void svm_range_free_dma_mappings(struct svm_range *prange, bool unmap_dma) { struct kfd_process_device *pdd; dma_addr_t *dma_addr; @@ -262,7 +262,8 @@ void svm_range_free_dma_mappings(struct svm_range *prange) continue; } dev = >dev->adev->pdev->dev; - svm_range_dma_unmap(dev, dma_addr, 0, prange->npages); + if (unmap_dma) + svm_range_dma_unmap(dev, dma_addr, 0, prange->npages); kvfree(dma_addr); prange->dma_addr[gpuidx] = NULL; } @@ -277,7 +278,7 @@ static void svm_range_free(struct svm_range *prange, bool update_mem_usage) I'd rename the update_mem_usage parameter to better represent what it means. Maybe something like "do_unmap". prange->start, prange->last); svm_range_vram_node_free(prange); - svm_range_free_dma_mappings(prange); + svm_range_free_dma_mappings(prange, update_mem_usage); if (update_mem_usage && !p->xnack_enabled) { pr_debug("unreserve prange 0x%p size: 0x%llx\n", prange, size); @@ -851,12 +852,46 @@ static void svm_range_debug_dump(struct svm_range_list *svms) } } +static int +svm_range_copy_array(void *ppdst, void *ppsrc, size_t size, ppdst and pprsc should be defined as void ** to avoid some ugly pointer type casts below. I'm not sure why ppsrc is a pointer to a pointer in the first place. I think it should just be a pointer because you don't need to update the caller's pointer. It may also be cleaner if you return the destination pointer as return value instead and return NULL if allocation failed. +uint64_t num_elements, uint64_t offset) +{ + unsigned char *dst, *psrc; + + psrc = *(unsigned char **)ppsrc; + dst = kvmalloc_array(num_elements, size, GFP_KERNEL); + if (!dst) + return -ENOMEM; + memcpy(dst, psrc + offset, num_elements * size); + *(void **)ppdst = dst; + + return 0; +} + +static int +svm_range_copy_dma_addrs(struct svm_range *dst, struct svm_range *src) +{ + int i, r; + + for (i = 0; i < MAX_GPU_INSTANCE; i++) { + if (!src->dma_addr[i]) + continue; + r = svm_range_copy_array(>dma_addr[i], >dma_addr[i], +sizeof(*src->dma_addr[i]), src->npages, 0); + if (r) + return r; + } + + return 0; +} + static int svm_range_split_array(void *ppnew, void *ppold, size_t size, uint64_t old_start, uint64_t old_n, uint64_t new_start, uint64_t new_n) { unsigned char *new, *old, *pold; + int r; uint64_t d; if (!ppold) @@ -865,22 +900,16 @@ svm_range_split_array(void *ppnew, void *ppold, size_t size, if (!pold) return 0; - new = kvmalloc_array(new_n, size, GFP_KERNEL); - if (!new) - return -ENOMEM; - d = (new_start - old_start) * size; -
Re: [PATCH v3] drm/amdgpu: Add EXT_COHERENCE memory allocation flags
In amdgpu_dma_buf_create_obj we copy the coherence-related flags to the SG BO that's used to attach the BO to the importer device. You need to add the new flag to the list. Some more nit-picks inline. Am 2023-07-26 um 09:34 schrieb David Francis: These flags (for GEM and SVM allocations) allocate memory that allows for system-scope atomic semantics. On GFX943 these flags cause caches to be avoided on non-local memory. On all other ASICs they are identical in functionality to the equivalent COHERENT flags. Corresponding Thunk patch is at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/88 Signed-off-by: David Francis --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 ++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c| 5 - drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 10 +- include/uapi/drm/amdgpu_drm.h| 7 +++ include/uapi/linux/kfd_ioctl.h | 3 +++ 7 files changed, 27 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index d34c3ef8f3ed..7f23bc0ee592 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1738,6 +1738,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( if (flags & KFD_IOC_ALLOC_MEM_FLAGS_COHERENT) alloc_flags |= AMDGPU_GEM_CREATE_COHERENT; + if (flags & KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENCE) + alloc_flags |= AMDGPU_GEM_CREATE_EXT_COHERENCE; if (flags & KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED) alloc_flags |= AMDGPU_GEM_CREATE_UNCACHED; diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c index 6b430e10d38e..8e951688668b 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c @@ -632,6 +632,7 @@ static void gmc_v10_0_get_vm_pte(struct amdgpu_device *adev, } if (bo && bo->flags & (AMDGPU_GEM_CREATE_COHERENT | + AMDGPU_GEM_CREATE_EXT_COHERENCE | AMDGPU_GEM_CREATE_UNCACHED)) *flags = (*flags & ~AMDGPU_PTE_MTYPE_NV10_MASK) | AMDGPU_PTE_MTYPE_NV10(MTYPE_UC); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c index a6ee0220db56..ff330c7c0232 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c @@ -540,6 +540,7 @@ static void gmc_v11_0_get_vm_pte(struct amdgpu_device *adev, } if (bo && bo->flags & (AMDGPU_GEM_CREATE_COHERENT | + AMDGPU_GEM_CREATE_EXT_COHERENCE | AMDGPU_GEM_CREATE_UNCACHED)) *flags = (*flags & ~AMDGPU_PTE_MTYPE_NV10_MASK) | AMDGPU_PTE_MTYPE_NV10(MTYPE_UC); diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 880460cd3239..e40fcfc1a3f3 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -1183,7 +1183,8 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, { struct amdgpu_device *bo_adev = amdgpu_ttm_adev(bo->tbo.bdev); bool is_vram = bo->tbo.resource->mem_type == TTM_PL_VRAM; - bool coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT; + bool coherent = bo->flags & (AMDGPU_GEM_CREATE_COHERENT | AMDGPU_GEM_CREATE_EXT_COHERENCE); + bool ext_coherence = bo->flags & AMDGPU_GEM_CREATE_EXT_COHERENCE; bool uncached = bo->flags & AMDGPU_GEM_CREATE_UNCACHED; struct amdgpu_vm *vm = mapping->bo_va->base.vm; unsigned int mtype_local, mtype; @@ -1251,6 +1252,8 @@ static void gmc_v9_0_get_coherence_flags(struct amdgpu_device *adev, snoop = true; if (uncached) { mtype = MTYPE_UC; + } else if (ext_coherence) { + mtype = is_local ? MTYPE_CC : MTYPE_UC; } else if (adev->flags & AMD_IS_APU) { mtype = is_local ? mtype_local : MTYPE_NC; } else { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 1b50eae051a4..28304b93a990 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1155,7 +1155,8 @@ svm_range_get_pte_flags(struct kfd_node *node, uint32_t mapping_flags = 0; uint64_t pte_flags; bool snoop = (domain != SVM_RANGE_VRAM_DOMAIN); - bool coherent = flags & KFD_IOCTL_SVM_FLAG_COHERENT; + bool coherent = flags & (KFD_IOCTL_SVM_FLAG_COHERENT | KFD_IOCTL_SVM_FLAG_EXT_COHERENCE); + bool ext_coherence = flags &
Re: [Patch V2 v2] drm/amdgpu: Checkpoint and Restore VRAM BOs without VA
Am 2023-07-25 um 17:11 schrieb Ramesh Errabolu: Extend checkpoint logic to allow inclusion of VRAM BOs that do not have a VA attached Signed-off-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 40ac093b5035..44c647c82070 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1845,7 +1845,8 @@ static uint32_t get_process_num_bos(struct kfd_process *p) idr_for_each_entry(>alloc_idr, mem, id) { struct kgd_mem *kgd_mem = (struct kgd_mem *)mem; - if ((uint64_t)kgd_mem->va > pdd->gpuvm_base) + if (((uint64_t)kgd_mem->va > pdd->gpuvm_base) || Unnecessary parentheses around (a > b). + !kgd_mem->va) num_of_bos++; } } @@ -1917,7 +1918,11 @@ static int criu_checkpoint_bos(struct kfd_process *p, kgd_mem = (struct kgd_mem *)mem; dumper_bo = kgd_mem->bo; - if ((uint64_t)kgd_mem->va <= pdd->gpuvm_base) + /* Skip checkpointing BOs that are used for Trap handler +* code and state. Currently, these BOs have a VA that +* is less GPUVM Base +*/ + if (((uint64_t)kgd_mem->va <= pdd->gpuvm_base) && kgd_mem->va) Unnecessary parentheses around (a <= b). In this condition I'd also prefer to put kgd_mem->va first, because it short-circuits execution for the case that va is NULL. With that fixed, the patch is Reviewed-by: Felix Kuehling continue; bo_bucket = _buckets[bo_index];
Re: [PATCH] drm/amdgpu: Checkpoint and Restore VRAM BOs without VA
Am 2023-07-25 um 16:04 schrieb Errabolu, Ramesh: [AMD Official Use Only - General] Responses inline. -Original Message- From: Kuehling, Felix Sent: Monday, July 24, 2023 2:51 PM To: amd-gfx@lists.freedesktop.org; Errabolu, Ramesh Subject: Re: [PATCH] drm/amdgpu: Checkpoint and Restore VRAM BOs without VA On 2023-07-24 11:57, Ramesh Errabolu wrote: Extend checkpoint logic to allow inclusion of VRAM BOs that do not have a VA attached Signed-off-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 40ac093b5035..5cc00ff4b635 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1845,7 +1845,8 @@ static uint32_t get_process_num_bos(struct kfd_process *p) idr_for_each_entry(>alloc_idr, mem, id) { struct kgd_mem *kgd_mem = (struct kgd_mem *)mem; - if ((uint64_t)kgd_mem->va > pdd->gpuvm_base) + if (((uint64_t)kgd_mem->va > pdd->gpuvm_base) || + (kgd_mem->va == 0)) I'm trying to remember what this condition is there to protect against, because it almost looks like we could drop the entire condition. I think it's there to avoid checkpointing the TMA/TBA BOs allocated by KFD itself. Ramesh: I am unsure as to how we can detect TMA/TBA BOs if we don't want them checkpointed. Alternatively we can checkpoint and restore TMA/TBA BOs without loss of function. If this o.k. we can drop the check that determines BO qualification. It's OK. Currently they have a VA > 0 and < gpuvm_base. So this check will still work if you only allow BOs with VA == 0. There is a patch in the works to move the TMA and TBA to the upper half of the virtual address space. Then we'll need to update this check to exclude anything that has bit 63 of the VA set. Regards, Felix That said, you have some unnecessary parentheses in this expression. And just using !x to check for 0 is usually preferred in the kernel. This should work and is more readable IMO: + if ((uint64_t)kgd_mem->va > pdd->gpuvm_base || !kgd_mem->va) num_of_bos++; } } @@ -1917,7 +1918,8 @@ static int criu_checkpoint_bos(struct kfd_process *p, kgd_mem = (struct kgd_mem *)mem; dumper_bo = kgd_mem->bo; - if ((uint64_t)kgd_mem->va <= pdd->gpuvm_base) + if (((uint64_t)kgd_mem->va <= pdd->gpuvm_base) && + !(kgd_mem->va == 0)) Similar to above: + if (kgd_mem->va && (uint64_t)kgd_mem->va <= pdd->gpuvm_base) Regards, Felix continue; bo_bucket = _buckets[bo_index];
Re: [PATCH] drm/amdkfd: start_cpsch don't map queues
On 2023-07-24 13:52, Philip Yang wrote: start_cpsch map queues when kfd_init_node have race condition with IOMMUv2 init, and cause the gfx ring test failed later. Remove it from start_cpsch because map queues will be done when creating queues and resume queues. Reported-by: Michel Dänzer Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling Michel, can you test whether this fixes your regression on Raven? Would be good to get a Tested-by for this patch, since we haven't been able to reproduce the problem yet. Thanks, Felix --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 71b7f16c0173..a2d0d0bcf853 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1658,9 +1658,6 @@ static int start_cpsch(struct device_queue_manager *dqm) dqm->is_resetting = false; dqm->sched_running = true; - if (!dqm->dev->kfd->shared_resources.enable_mes) - execute_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD); - /* Set CWSR grace period to 1x1000 cycle for GFX9.4.3 APU */ if (amdgpu_emu_mode == 0 && dqm->dev->adev->gmc.is_app_apu && (KFD_GC_VERSION(dqm->dev) == IP_VERSION(9, 4, 3))) {
Re: [PATCH] drm/amdgpu: Checkpoint and Restore VRAM BOs without VA
On 2023-07-24 11:57, Ramesh Errabolu wrote: Extend checkpoint logic to allow inclusion of VRAM BOs that do not have a VA attached Signed-off-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 40ac093b5035..5cc00ff4b635 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1845,7 +1845,8 @@ static uint32_t get_process_num_bos(struct kfd_process *p) idr_for_each_entry(>alloc_idr, mem, id) { struct kgd_mem *kgd_mem = (struct kgd_mem *)mem; - if ((uint64_t)kgd_mem->va > pdd->gpuvm_base) + if (((uint64_t)kgd_mem->va > pdd->gpuvm_base) || + (kgd_mem->va == 0)) I'm trying to remember what this condition is there to protect against, because it almost looks like we could drop the entire condition. I think it's there to avoid checkpointing the TMA/TBA BOs allocated by KFD itself. That said, you have some unnecessary parentheses in this expression. And just using !x to check for 0 is usually preferred in the kernel. This should work and is more readable IMO: + if ((uint64_t)kgd_mem->va > pdd->gpuvm_base || !kgd_mem->va) num_of_bos++; } } @@ -1917,7 +1918,8 @@ static int criu_checkpoint_bos(struct kfd_process *p, kgd_mem = (struct kgd_mem *)mem; dumper_bo = kgd_mem->bo; - if ((uint64_t)kgd_mem->va <= pdd->gpuvm_base) + if (((uint64_t)kgd_mem->va <= pdd->gpuvm_base) && + !(kgd_mem->va == 0)) Similar to above: + if (kgd_mem->va && (uint64_t)kgd_mem->va <= pdd->gpuvm_base) Regards, Felix continue; bo_bucket = _buckets[bo_index];
Re: [PATCH] drm/amdkfd: avoid svm dump when dynamic debug disabled
Am 2023-07-19 um 17:22 schrieb Alex Sierra: Set dynamic_svm_range_dump macro to avoid iterating over SVM lists from svm_range_debug_dump when dynamic debug is disabled. Otherwise, it could drop performance, specially with big number of SVM ranges. Make sure both svm_range_set_attr and svm_range_debug_dump functions are dynamically enabled to print svm_range_debug_dump debug traces. Signed-off-by: Alex Sierra Tested-by: Alex Sierra Signed-off-by: Philip Yang Signed-off-by: Felix Kuehling I don't think my name on a Signed-off-by is appropriate here. I didn't write the patch. And I'm not submitting it. However, the patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 479c4f66afa7..1b50eae051a4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -46,6 +46,8 @@ * page table is updated. */ #define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING (2UL * NSEC_PER_MSEC) +#define dynamic_svm_range_dump(svms) \ + _dynamic_func_call_no_desc("svm_range_dump", svm_range_debug_dump, svms) /* Giant svm range split into smaller ranges based on this, it is decided using * minimum of all dGPU/APU 1/32 VRAM size, between 2MB to 1GB and alignment to @@ -3563,7 +3565,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, break; } - svm_range_debug_dump(svms); + dynamic_svm_range_dump(svms); mutex_unlock(>lock); mmap_read_unlock(mm);
Re: [PATCH] drm/amdkfd: avoid svm dump when dynamic debug disabled
Am 2023-07-19 um 14:03 schrieb Alex Sierra: Set dynamic_svm_range_dump macro to avoid iterating over SVM lists from svm_range_debug_dump when dynamic debug is disabled. Otherwise, it could drop performance, specially with big number of SVM ranges. Make sure both svm_range_set_attr and svm_range_debug_dump functions are dynamically enabled to print svm_range_debug_dump debug traces. Signed-off-by: Alex Sierra Tested-by: Alex Sierra Signed-off-by: Philip Yang Signed-off-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_svm.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 479c4f66afa7..0687f27f506c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -3563,7 +3563,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, break; } - svm_range_debug_dump(svms); + dynamic_svm_range_dump(svms); mutex_unlock(>lock); mmap_read_unlock(mm); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h index 21b14510882b..ed4cd501fafe 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h @@ -39,6 +39,9 @@ #define SVM_ADEV_PGMAP_OWNER(adev)\ ((adev)->hive ? (void *)(adev)->hive : (void *)(adev)) +#define dynamic_svm_range_dump(svms) \ + _dynamic_func_call_no_desc("svm_range_dump", svm_range_debug_dump, svms) + This should be in kfd_svm.c. The function svm_range_debug_dump is a static function in that file. This macro is not useful outside of it. Regards, Felix struct svm_range_bo { struct amdgpu_bo*bo; struct kref kref;
Re: [PATCH] drm/amdkfd: avoid svm dump when dynamic debug disabled
Am 2023-07-08 um 12:57 schrieb Alex Sierra: svm_range_debug_dump should not be called at all when dynamic debug is disabled to avoid iterating over SVM lists. This could drop performance, specially with big number of SVM ranges. Signed-off-by: Alex Sierra Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 479c4f66afa7..4fb427fc5942 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -821,7 +821,7 @@ svm_range_is_same_attrs(struct kfd_process *p, struct svm_range *prange, * * Context: The caller must hold svms->lock */ -static void svm_range_debug_dump(struct svm_range_list *svms) +static int svm_range_debug_dump(struct svm_range_list *svms) { struct interval_tree_node *node; struct svm_range *prange; @@ -847,6 +847,8 @@ static void svm_range_debug_dump(struct svm_range_list *svms) prange->actual_loc); node = interval_tree_iter_next(node, 0, ~0ULL); } + + return 0; } static int @@ -3563,7 +3565,7 @@ svm_range_set_attr(struct kfd_process *p, struct mm_struct *mm, break; } - svm_range_debug_dump(svms); + pr_debug("%d", svm_range_debug_dump(svms)); This is a bit hacky. I would use the way dynamic_hex_dump is defined as an example for how to do this without the dummy pr_debug and without returning a dummy result from svm_range_debug_dump: #define dynamic_svm_range_dump(svms) \ _dynamic_func_call_no_desc("svm_range_dump", svm_range_debug_dump, svms) Then instead of calling svm_range_debug_dump directly, call dynamic_svm_range_dump(svms). Regards, Felix mutex_unlock(>lock); mmap_read_unlock(mm);
Re: [PATCH] drm/amdkfd: enable cooperative groups for gfx11
Am 2023-07-19 um 10:36 schrieb Jonathan Kim: MES can concurrently schedule queues on the device that require exclusive device access if marked exclusively_scheduled without the requirement of GWS. Similar to the F32 HWS, MES will manage quality of service for these queues. Use this for cooperative groups since cooperative groups are device occupancy limited. Since some GFX11 devices can only be debugged with partial CUs, do not allow the debugging of cooperative groups on these devices as the CU occupancy limit will change on attach. In addition, zero initialize the MES add queue submission vector for MES initialization tests as we do not want these to be cooperative dispatches. v2: fix up indentation and comments. remove unnecessary perf warning on oversubscription. change 0 init to 0 memset to deal with padding. Signed-off-by: Jonathan Kim Sorry. More indentation nit-picks inline. With those fixed, the patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 6 +- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c| 7 ++- .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 12 drivers/gpu/drm/amd/include/mes_v11_api_def.h| 4 +++- 9 files changed, 27 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index f808841310fd..72ab6a838bb6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -642,6 +642,8 @@ int amdgpu_mes_add_hw_queue(struct amdgpu_device *adev, int gang_id, unsigned long flags; int r; + memset(_input, 0, sizeof(struct mes_add_queue_input)); + /* allocate the mes queue buffer */ queue = kzalloc(sizeof(struct amdgpu_mes_queue), GFP_KERNEL); if (!queue) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h index 2d6ac30b7135..2053954a235c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h @@ -224,6 +224,7 @@ struct mes_add_queue_input { uint32_tis_kfd_process; uint32_tis_aql_queue; uint32_tqueue_size; + uint32_texclusively_scheduled; }; struct mes_remove_queue_input { diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index 1bdaa00c0b46..8e67e965f7ea 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -214,6 +214,8 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes, mes_add_queue_pkt.is_aql_queue = input->is_aql_queue; mes_add_queue_pkt.gds_size = input->queue_size; + mes_add_queue_pkt.exclusively_scheduled = input->exclusively_scheduled; + return mes_v11_0_submit_pkt_and_poll_completion(mes, _add_queue_pkt, sizeof(mes_add_queue_pkt), offsetof(union MESAPI__ADD_QUEUE, api_status)); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 40ac093b5035..e0f9cf6dd8fd 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1487,7 +1487,8 @@ static int kfd_ioctl_alloc_queue_gws(struct file *filep, goto out_unlock; } - if (!kfd_dbg_has_gws_support(dev) && p->debug_trap_enabled) { + if (p->debug_trap_enabled && (!kfd_dbg_has_gws_support(dev) || + kfd_dbg_has_cwsr_workaround(dev))) { retval = -EBUSY; goto out_unlock; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index ccfc81f085ce..1f82caea59ba 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -753,7 +753,8 @@ int kfd_dbg_trap_enable(struct kfd_process *target, uint32_t fd, if (!KFD_IS_SOC15(pdd->dev)) return -ENODEV; - if (!kfd_dbg_has_gws_support(pdd->dev) && pdd->qpd.num_gws) + if (pdd->qpd.num_gws && (!kfd_dbg_has_gws_support(pdd->dev) || +kfd_dbg_has_cwsr_workaround(pdd->dev))) return -EBUSY; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 0b3dc754e06b..ebc9674d3ce1 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -508,6 +508,7 @@ stati
Re: [PATCH v2 2/4] drm/amdkfd: use vma_is_initial_stack() and vma_is_initial_heap()
Am 2023-07-19 um 03:51 schrieb Kefeng Wang: Use the helpers to simplify code. Cc: Felix Kuehling Cc: Alex Deucher Cc: "Christian König" Cc: "Pan, Xinhui" Cc: David Airlie Cc: Daniel Vetter Signed-off-by: Kefeng Wang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index 5ff1a5a89d96..0b7bfbd0cb66 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -2621,10 +2621,7 @@ svm_range_get_range_boundaries(struct kfd_process *p, int64_t addr, return -EFAULT; } - *is_heap_stack = (vma->vm_start <= vma->vm_mm->brk && - vma->vm_end >= vma->vm_mm->start_brk) || -(vma->vm_start <= vma->vm_mm->start_stack && - vma->vm_end >= vma->vm_mm->start_stack); + *is_heap_stack = vma_is_initial_heap(vma) || vma_is_initial_stack(vma); start_limit = max(vma->vm_start >> PAGE_SHIFT, (unsigned long)ALIGN_DOWN(addr, 2UL << 8));
Re: [PATCH 1/2] drm/amdkfd: fix trap handling work around for debugging
Am 2023-07-14 um 05:37 schrieb Jonathan Kim: Update the list of devices that require the cwsr trap handling workaround for debugging use cases. Signed-off-by: Jonathan Kim This patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 5 ++--- drivers/gpu/drm/amd/amdkfd/kfd_debug.h| 6 ++ drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 6 ++ 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index 190b03efe5ff..ccfc81f085ce 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -302,8 +302,7 @@ static int kfd_dbg_set_queue_workaround(struct queue *q, bool enable) if (!q) return 0; - if (KFD_GC_VERSION(q->device) < IP_VERSION(11, 0, 0) || - KFD_GC_VERSION(q->device) >= IP_VERSION(12, 0, 0)) + if (!kfd_dbg_has_cwsr_workaround(q->device)) return 0; if (enable && q->properties.is_user_cu_masked) @@ -349,7 +348,7 @@ int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd) { uint32_t spi_dbg_cntl = pdd->spi_dbg_override | pdd->spi_dbg_launch_mode; uint32_t flags = pdd->process->dbg_flags; - bool sq_trap_en = !!spi_dbg_cntl; + bool sq_trap_en = !!spi_dbg_cntl || !kfd_dbg_has_cwsr_workaround(pdd->dev); if (!kfd_dbg_is_per_vmid_supported(pdd->dev)) return 0; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h index ba616ed17dee..586d7f886712 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h @@ -101,6 +101,12 @@ static inline bool kfd_dbg_is_rlc_restore_supported(struct kfd_node *dev) KFD_GC_VERSION(dev) == IP_VERSION(10, 1, 1)); } +static inline bool kfd_dbg_has_cwsr_workaround(struct kfd_node *dev) +{ + return KFD_GC_VERSION(dev) >= IP_VERSION(11, 0, 0) && + KFD_GC_VERSION(dev) <= IP_VERSION(11, 0, 3); +} + static inline bool kfd_dbg_has_gws_support(struct kfd_node *dev) { if ((KFD_GC_VERSION(dev) == IP_VERSION(9, 0, 1) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 31cac1fd0d58..761963ad6154 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -226,8 +226,7 @@ static int add_queue_mes(struct device_queue_manager *dqm, struct queue *q, queue_input.paging = false; queue_input.tba_addr = qpd->tba_addr; queue_input.tma_addr = qpd->tma_addr; - queue_input.trap_en = KFD_GC_VERSION(q->device) < IP_VERSION(11, 0, 0) || - KFD_GC_VERSION(q->device) > IP_VERSION(11, 0, 3); + queue_input.trap_en = !kfd_dbg_has_cwsr_workaround(q->device); queue_input.skip_process_ctx_clear = qpd->pqm->process->debug_trap_enabled; queue_type = convert_to_mes_queue_type(q->properties.type); @@ -1827,8 +1826,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, */ q->properties.is_evicted = !!qpd->evicted; q->properties.is_dbg_wa = qpd->pqm->process->debug_trap_enabled && - KFD_GC_VERSION(q->device) >= IP_VERSION(11, 0, 0) && - KFD_GC_VERSION(q->device) <= IP_VERSION(11, 0, 3); + kfd_dbg_has_cwsr_workaround(q->device); if (qd) mqd_mgr->restore_mqd(mqd_mgr, >mqd, q->mqd_mem_obj, >gart_mqd_addr,
Re: [PATCH 2/2] drm/amdkfd: enable cooperative groups for gfx11
Am 2023-07-14 um 05:37 schrieb Jonathan Kim: MES can concurrently schedule queues on the device that require exclusive device access if marked exclusively_scheduled without the requirement of GWS. Similar to the F32 HWS, MES will manage quality of service for these queues. Use this for cooperative groups since cooperative groups are device occupancy limited. Since some GFX11 devices can only be debugged with partial CUs, do not allow the debugging of cooperative groups on these devices as the CU occupancy limit will change on attach. In addition, zero initialize the MES add queue submission vector for MES initialization tests as we do not want these to be cooperative dispatches. NOTE: FIXME MES FW enablement checks are a placeholder at the moment and will be updated when the binary revision number is finalized. Signed-off-by: Jonathan Kim Some nit-picks inline. Looks good to me otherwise. --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 1 + drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 9 - .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c| 11 +++ drivers/gpu/drm/amd/include/mes_v11_api_def.h | 4 +++- 9 files changed, 27 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index e9091ebfe230..8d13623389d8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -638,7 +638,7 @@ int amdgpu_mes_add_hw_queue(struct amdgpu_device *adev, int gang_id, { struct amdgpu_mes_queue *queue; struct amdgpu_mes_gang *gang; - struct mes_add_queue_input queue_input; + struct mes_add_queue_input queue_input = {0}; Generally, it is preferred to use memset to initialize structures on the stack because that also initializes padding. unsigned long flags; int r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h index 2d6ac30b7135..2053954a235c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h @@ -224,6 +224,7 @@ struct mes_add_queue_input { uint32_tis_kfd_process; uint32_tis_aql_queue; uint32_tqueue_size; + uint32_texclusively_scheduled; }; struct mes_remove_queue_input { diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index 1bdaa00c0b46..8e67e965f7ea 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -214,6 +214,8 @@ static int mes_v11_0_add_hw_queue(struct amdgpu_mes *mes, mes_add_queue_pkt.is_aql_queue = input->is_aql_queue; mes_add_queue_pkt.gds_size = input->queue_size; + mes_add_queue_pkt.exclusively_scheduled = input->exclusively_scheduled; + return mes_v11_0_submit_pkt_and_poll_completion(mes, _add_queue_pkt, sizeof(mes_add_queue_pkt), offsetof(union MESAPI__ADD_QUEUE, api_status)); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 40ac093b5035..e18401811956 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1487,7 +1487,8 @@ static int kfd_ioctl_alloc_queue_gws(struct file *filep, goto out_unlock; } - if (!kfd_dbg_has_gws_support(dev) && p->debug_trap_enabled) { + if (p->debug_trap_enabled && (!kfd_dbg_has_gws_support(dev) || + kfd_dbg_has_cwsr_workaround(dev))) { Indentation looks off. kfd_dbg_has_cwsr_workaround should be indented one less space. Otherwise you may be incorrectly implying that the ! applies to it. retval = -EBUSY; goto out_unlock; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index ccfc81f085ce..895e7f690fd0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -753,7 +753,8 @@ int kfd_dbg_trap_enable(struct kfd_process *target, uint32_t fd, if (!KFD_IS_SOC15(pdd->dev)) return -ENODEV; - if (!kfd_dbg_has_gws_support(pdd->dev) && pdd->qpd.num_gws) + if (pdd->qpd.num_gws && (!kfd_dbg_has_gws_support(pdd->dev) || + kfd_dbg_has_cwsr_workaround(pdd->dev))) Same as above. return -EBUSY; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
Re: [PATCH 4/4] drm/amdgpu: use a macro to define no xcp partition case
On 2023-07-16 22:26, Guchun Chen wrote: ~0 as no xcp partition is used in several places, so improve its definition by a macro for code consistency. Suggested-by: Christian König Signed-off-by: Guchun Chen The series is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h | 2 ++ drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 4 ++-- 4 files changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index a7f314ddd173..d34c3ef8f3ed 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1709,7 +1709,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu( alloc_flags |= (flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) ? AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED : 0; } - xcp_id = fpriv->xcp_id == ~0 ? 0 : fpriv->xcp_id; + xcp_id = fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION ? + 0 : fpriv->xcp_id; } else if (flags & KFD_IOC_ALLOC_MEM_FLAGS_GTT) { domain = alloc_domain = AMDGPU_GEM_DOMAIN_GTT; alloc_flags = 0; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c index d175e862f222..9c9cca129498 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c @@ -363,7 +363,7 @@ int amdgpu_xcp_open_device(struct amdgpu_device *adev, if (!adev->xcp_mgr) return 0; - fpriv->xcp_id = ~0; + fpriv->xcp_id = AMDGPU_XCP_NO_PARTITION; for (i = 0; i < MAX_XCP; ++i) { if (!adev->xcp_mgr->xcp[i].ddev) break; @@ -381,7 +381,7 @@ int amdgpu_xcp_open_device(struct amdgpu_device *adev, } } - fpriv->vm.mem_id = fpriv->xcp_id == ~0 ? -1 : + fpriv->vm.mem_id = fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION ? -1 : adev->xcp_mgr->xcp[fpriv->xcp_id].mem_id; return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h index 0f8026d64ea5..9a1036aeec2a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h @@ -37,6 +37,8 @@ #define AMDGPU_XCP_FL_NONE 0 #define AMDGPU_XCP_FL_LOCKED (1 << 0) +#define AMDGPU_XCP_NO_PARTITION (~0) + struct amdgpu_fpriv; enum AMDGPU_XCP_IP_BLOCK { diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c index 16471b81a1f5..72b629a78c62 100644 --- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c +++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c @@ -68,7 +68,7 @@ static void aqua_vanjaram_set_xcp_id(struct amdgpu_device *adev, enum AMDGPU_XCP_IP_BLOCK ip_blk; uint32_t inst_mask; - ring->xcp_id = ~0; + ring->xcp_id = AMDGPU_XCP_NO_PARTITION; if (adev->xcp_mgr->mode == AMDGPU_XCP_MODE_NONE) return; @@ -177,7 +177,7 @@ static int aqua_vanjaram_select_scheds( u32 sel_xcp_id; int i; - if (fpriv->xcp_id == ~0) { + if (fpriv->xcp_id == AMDGPU_XCP_NO_PARTITION) { u32 least_ref_cnt = ~0; fpriv->xcp_id = 0;
Re: [PATCH 3/5] drm/amdkfd: use vma_is_stack() and vma_is_heap()
Am 2023-07-14 um 10:26 schrieb Vlastimil Babka: On 7/12/23 18:24, Felix Kuehling wrote: Allocations in the heap and stack tend to be small, with several allocations sharing the same page. Sharing the same page for different allocations with different access patterns leads to thrashing when we migrate data back and forth on GPU and CPU access. To avoid this we disable HMM migrations for head and stack VMAs. Wonder how well does it really work in practice? AFAIK "heaps" (malloc()) today uses various arenas obtained by mmap() and not a single brk() managed space anymore? And programs might be multithreaded, thus have multiple stacks, while vma_is_stack() will recognize only the initial one... Thanks for these pointers. I have not heard of such problems with mmap arenas and multiple thread stacks in practice. But I'll keep it in mind in case we observe unexpected thrashing in the future. FWIW, we once had the opposite problem of a custom malloc implementation that used sbrk for very large allocations. This disabled migrations of large buffers unexpectedly. I agree that eventually we'll want a more dynamic way of detecting and suppressing thrashing that's based on observed memory access patterns. Getting this right is probably trickier than it sounds, so I'd prefer to have some more experience with real workloads to use as benchmarks. Compared to other things we're working on, this is fairly low on our priority list at the moment. Using the VMA flags is a simple and effective method for now, at least until we see it failing in real workloads. Regards, Felix Vlastimil Regards, Felix Am 2023-07-12 um 10:42 schrieb Christoph Hellwig: On Wed, Jul 12, 2023 at 10:38:29PM +0800, Kefeng Wang wrote: Use the helpers to simplify code. Nothing against your addition of a helper, but a GPU driver really should have no business even looking at this information..
Re: [PATCH v3 09/12] drm/amdgpu: use doorbell manager for kfd process doorbells
On 2023-06-20 13:16, Shashank Sharma wrote: This patch: - adds a doorbell object in kfd pdd structure. - allocates doorbells for a process while creating its queue. - frees the doorbells with pdd destroy. - moves doorbell bitmap init function to kfd_doorbell.c PS: This patch ensures that we don't break the existing KFD functionality, but now KFD userspace library should also create doorbell pages as AMDGPU GEM objects using libdrm functions in userspace. The reference code for the same is available with AMDGPU Usermode queue libdrm MR. Once this is done, we will not need to create process doorbells in kernel. V2: - Do not use doorbell wrapper API, use amdgpu_bo_create_kernel instead (Alex). - Do not use custom doorbell structure, instead use separate variables for bo and doorbell_bitmap (Alex) V3: - Do not allocate doorbell page with PDD, delay doorbell process page allocation until really needed (Felix) Cc: Alex Deucher Cc: Christian Koenig Cc: Felix Kuehling Acked-by: Christian König Signed-off-by: Shashank Sharma Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 20 ++-- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 8 +- drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 103 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 9 +- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 40 +-- .../amd/amdkfd/kfd_process_queue_manager.c| 23 ++-- 6 files changed, 108 insertions(+), 95 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 1b54a9aaae70..5d4f4fca793a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -327,10 +327,12 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, goto err_bind_process; } - if (!pdd->doorbell_index && - kfd_alloc_process_doorbells(dev, >doorbell_index) < 0) { - err = -ENOMEM; - goto err_alloc_doorbells; + if (!pdd->qpd.proc_doorbells) { + err = kfd_alloc_process_doorbells(dev, pdd); + if (err) { + pr_debug("failed to allocate process doorbells\n"); + goto err_bind_process; + } } /* Starting with GFX11, wptr BOs must be mapped to GART for MES to determine work @@ -410,7 +412,6 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, if (wptr_bo) amdgpu_amdkfd_free_gtt_mem(dev->adev, wptr_bo); err_wptr_map_gart: -err_alloc_doorbells: err_bind_process: err_pdd: mutex_unlock(>mutex); @@ -2239,11 +2240,12 @@ static int criu_restore_devices(struct kfd_process *p, goto exit; } - if (!pdd->doorbell_index && - kfd_alloc_process_doorbells(pdd->dev, >doorbell_index) < 0) { - ret = -ENOMEM; - goto exit; + if (!pdd->qpd.proc_doorbells) { + ret = kfd_alloc_process_doorbells(dev, pdd); + if (ret) + goto exit; } + } /* diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 7a95698d83f7..834f640cf807 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -371,7 +371,7 @@ static int allocate_doorbell(struct qcm_process_device *qpd, unsigned int found; found = find_first_zero_bit(qpd->doorbell_bitmap, - KFD_MAX_NUM_OF_QUEUES_PER_PROCESS); + KFD_MAX_NUM_OF_QUEUES_PER_PROCESS); if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) { pr_debug("No doorbells available"); return -EBUSY; @@ -381,9 +381,9 @@ static int allocate_doorbell(struct qcm_process_device *qpd, } } - q->properties.doorbell_off = - kfd_get_doorbell_dw_offset_in_bar(dev, qpd_to_pdd(qpd), - q->doorbell_id); + q->properties.doorbell_off = amdgpu_doorbell_index_on_bar(dev->adev, + qpd->proc_doorbells, + q->doorbell_id); return 0; } diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c index f7d45057ed32..c9ca21e1a99a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c +++ b/
Re: [PATCH v3 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells
On 2023-06-20 13:16, Shashank Sharma wrote: This patch: - adds a doorbell bo in kfd device structure. - creates doorbell page for kfd kernel usages. - updates the get_kernel_doorbell and free_kernel_doorbell functions accordingly V2: Do not use wrapper API, use direct amdgpu_create_kernel(Alex) V3: - Move single variable declaration below (Christian) - Add a to-do item to reuse the KGD kernel level doorbells for KFD for non-MES cases, instead of reserving one page (Felix) Cc: Alex Deucher Cc: Christian Koenig Cc: Felix Kuehling Signed-off-by: Shashank Sharma Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 - drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 109 +++--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 6 ++ 3 files changed, 39 insertions(+), 78 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 00f528eb9812..36fbe9c840ee 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -437,8 +437,6 @@ struct kfd_dev *kgd2kfd_probe(struct amdgpu_device *adev, bool vf) atomic_set(>compute_profile, 0); mutex_init(>doorbell_mutex); - memset(>doorbell_available_index, 0, - sizeof(kfd->doorbell_available_index)); atomic_set(>sram_ecc_flag, 0); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c index 38c9e1ca6691..f7d45057ed32 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c @@ -61,81 +61,46 @@ size_t kfd_doorbell_process_slice(struct kfd_dev *kfd) /* Doorbell calculations for device init. */ int kfd_doorbell_init(struct kfd_dev *kfd) { - size_t doorbell_start_offset; - size_t doorbell_aperture_size; - size_t doorbell_process_limit; + int size = PAGE_SIZE; + int r; /* -* With MES enabled, just set the doorbell base as it is needed -* to calculate doorbell physical address. -*/ - if (kfd->shared_resources.enable_mes) { - kfd->doorbell_base = - kfd->shared_resources.doorbell_physical_address; - return 0; - } - - /* -* We start with calculations in bytes because the input data might -* only be byte-aligned. -* Only after we have done the rounding can we assume any alignment. +* Todo: KFD kernel level operations need only one doorbell for +* ring test/HWS. So instead of reserving a whole page here for +* kernel, reserve and consume a doorbell from existing KGD kernel +* doorbell page. */ - doorbell_start_offset = - roundup(kfd->shared_resources.doorbell_start_offset, - kfd_doorbell_process_slice(kfd)); - - doorbell_aperture_size = - rounddown(kfd->shared_resources.doorbell_aperture_size, - kfd_doorbell_process_slice(kfd)); - - if (doorbell_aperture_size > doorbell_start_offset) - doorbell_process_limit = - (doorbell_aperture_size - doorbell_start_offset) / - kfd_doorbell_process_slice(kfd); - else - return -ENOSPC; - - if (!kfd->max_doorbell_slices || - doorbell_process_limit < kfd->max_doorbell_slices) - kfd->max_doorbell_slices = doorbell_process_limit; - - kfd->doorbell_base = kfd->shared_resources.doorbell_physical_address + - doorbell_start_offset; - - kfd->doorbell_base_dw_offset = doorbell_start_offset / sizeof(u32); - - kfd->doorbell_kernel_ptr = ioremap(kfd->doorbell_base, - kfd_doorbell_process_slice(kfd)); - - if (!kfd->doorbell_kernel_ptr) + /* Bitmap to dynamically allocate doorbells from kernel page */ + kfd->doorbell_bitmap = bitmap_zalloc(size / sizeof(u32), GFP_KERNEL); + if (!kfd->doorbell_bitmap) { + DRM_ERROR("Failed to allocate kernel doorbell bitmap\n"); return -ENOMEM; + } - pr_debug("Doorbell initialization:\n"); - pr_debug("doorbell base == 0x%08lX\n", - (uintptr_t)kfd->doorbell_base); - - pr_debug("doorbell_base_dw_offset == 0x%08lX\n", - kfd->doorbell_base_dw_offset); - - pr_debug("doorbell_process_limit == 0x%08lX\n", - doorbell_process_limit); - - pr_debug("doorbell_kernel_offset == 0x%08lX\n", - (uintptr_t)kfd->doorbell_base); - - pr_debug("doorbell aperture size
Re: [PATCH Review V2 2/2] drm/amdgpu: Disable RAS by default on APU flatform
On 2023-07-13 10:50, Stanley.Yang wrote: Disable RAS feature by default for aqua vanjaram on APU platform. Changed from V1: Splite Disable RAS by default on APU platform into a separated patch. Signed-off-by: Stanley.Yang Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 8673d9790bb0..ec5f60b64346 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2517,6 +2517,15 @@ static void amdgpu_ras_check_supported(struct amdgpu_device *adev) adev->ras_hw_enabled |= (1 << AMDGPU_RAS_BLOCK__GFX | 1 << AMDGPU_RAS_BLOCK__SDMA | 1 << AMDGPU_RAS_BLOCK__MMHUB); + + if (adev->ip_versions[MP0_HWIP][0] == IP_VERSION(13, 0, 6)) { + /* +* Disable ras feature for aqua vanjaram +* by default on apu platform. +*/ + if (-1 == amdgpu_ras_enable) + amdgpu_ras_enable = 0; Changing a global variable here is probably not appropriate. The condition above looks like this should affect a device-specific variable only. Regards, Felix + } } amdgpu_ras_get_quirks(adev);
Re: [PATCH 3/5] drm/amdkfd: use vma_is_stack() and vma_is_heap()
Allocations in the heap and stack tend to be small, with several allocations sharing the same page. Sharing the same page for different allocations with different access patterns leads to thrashing when we migrate data back and forth on GPU and CPU access. To avoid this we disable HMM migrations for head and stack VMAs. Regards, Felix Am 2023-07-12 um 10:42 schrieb Christoph Hellwig: On Wed, Jul 12, 2023 at 10:38:29PM +0800, Kefeng Wang wrote: Use the helpers to simplify code. Nothing against your addition of a helper, but a GPU driver really should have no business even looking at this information..
Re: [PATCH v5 04/10] drm/amdgpu: create GFX-gen11 usermode queue
Am 2023-07-12 um 11:55 schrieb Shashank Sharma: On 11/07/2023 21:51, Felix Kuehling wrote: On 2023-07-06 09:39, Christian König wrote: Am 06.07.23 um 15:37 schrieb Shashank Sharma: On 06/07/2023 15:22, Christian König wrote: Am 06.07.23 um 14:35 schrieb Shashank Sharma: A Memory queue descriptor (MQD) of a userqueue defines it in the hw's context. As MQD format can vary between different graphics IPs, we need gfx GEN specific handlers to create MQDs. This patch: - Introduces MQD handler functions for the usermode queues. - Adds new functions to create and destroy userqueue MQD for GFX-GEN-11 IP V1: Worked on review comments from Alex: - Make MQD functions GEN and IP specific V2: Worked on review comments from Alex: - Reuse the existing adev->mqd[ip] for MQD creation - Formatting and arrangement of code V3: - Integration with doorbell manager V4: Review comments addressed: - Do not create a new file for userq, reuse gfx_v11_0.c (Alex) - Align name of structure members (Luben) - Don't break up the Cc tag list and the Sob tag list in commit message (Luben) V5: - No need to reserve the bo for MQD (Christian). - Some more changes to support IP specific MQD creation. Cc: Alex Deucher Cc: Christian Koenig Signed-off-by: Shashank Sharma Signed-off-by: Arvind Yadav --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 16 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 73 +++ .../gpu/drm/amd/include/amdgpu_userqueue.h | 7 ++ 3 files changed, 96 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c index e37b5da5a0d0..bb774144c372 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c @@ -134,12 +134,28 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data, return r; } +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs; + +static void +amdgpu_userqueue_setup_gfx(struct amdgpu_userq_mgr *uq_mgr) +{ + int maj; + struct amdgpu_device *adev = uq_mgr->adev; + uint32_t version = adev->ip_versions[GC_HWIP][0]; + + /* We support usermode queue only for GFX V11 as of now */ + maj = IP_VERSION_MAJ(version); + if (maj == 11) + uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = _gfx_v11_funcs; +} + int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev) { mutex_init(_mgr->userq_mutex); idr_init_base(_mgr->userq_idr, 1); userq_mgr->adev = adev; + amdgpu_userqueue_setup_gfx(userq_mgr); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index c4940b6ea1c4..e76e1b86b434 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -30,6 +30,7 @@ #include "amdgpu_psp.h" #include "amdgpu_smu.h" #include "amdgpu_atomfirmware.h" +#include "amdgpu_userqueue.h" #include "imu_v11_0.h" #include "soc21.h" #include "nvd.h" @@ -6486,3 +6487,75 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block = .rev = 0, .funcs = _v11_0_ip_funcs, }; + +static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr, + struct drm_amdgpu_userq_in *args_in, + struct amdgpu_usermode_queue *queue) +{ + struct amdgpu_device *adev = uq_mgr->adev; + struct amdgpu_mqd *mqd_gfx_generic = >mqds[AMDGPU_HW_IP_GFX]; + struct drm_amdgpu_userq_mqd_gfx_v11_0 mqd_user; + struct amdgpu_mqd_prop userq_props; + int r; + + /* Incoming MQD parameters from userspace to be saved here */ + memset(_user, 0, sizeof(mqd_user)); + + /* Structure to initialize MQD for userqueue using generic MQD init function */ + memset(_props, 0, sizeof(userq_props)); + + if (args_in->mqd_size != sizeof(struct drm_amdgpu_userq_mqd_gfx_v11_0)) { + DRM_ERROR("MQD size mismatch\n"); + return -EINVAL; + } + + if (copy_from_user(_user, u64_to_user_ptr(args_in->mqd), args_in->mqd_size)) { + DRM_ERROR("Failed to get user MQD\n"); + return -EFAULT; + } + + /* Create BO for actual Userqueue MQD now */ + r = amdgpu_bo_create_kernel(adev, mqd_gfx_generic->mqd_size, PAGE_SIZE, + AMDGPU_GEM_DOMAIN_GTT, + >mqd.obj, + >mqd.gpu_addr, + >mqd.cpu_ptr); + if (r) { + DRM_ERROR("Failed to allocate BO for userqueue (%d)", r); + return -ENOMEM; + } Using amdgpu_bo_create_kernel() for the MQD is most likely not a good idea in the long term, but should work for now. I was a bit curious about this, the scope of this MQD object is kernel internal and used for queue mapp
Re: [PATCH v5 04/10] drm/amdgpu: create GFX-gen11 usermode queue
On 2023-07-06 09:39, Christian König wrote: Am 06.07.23 um 15:37 schrieb Shashank Sharma: On 06/07/2023 15:22, Christian König wrote: Am 06.07.23 um 14:35 schrieb Shashank Sharma: A Memory queue descriptor (MQD) of a userqueue defines it in the hw's context. As MQD format can vary between different graphics IPs, we need gfx GEN specific handlers to create MQDs. This patch: - Introduces MQD handler functions for the usermode queues. - Adds new functions to create and destroy userqueue MQD for GFX-GEN-11 IP V1: Worked on review comments from Alex: - Make MQD functions GEN and IP specific V2: Worked on review comments from Alex: - Reuse the existing adev->mqd[ip] for MQD creation - Formatting and arrangement of code V3: - Integration with doorbell manager V4: Review comments addressed: - Do not create a new file for userq, reuse gfx_v11_0.c (Alex) - Align name of structure members (Luben) - Don't break up the Cc tag list and the Sob tag list in commit message (Luben) V5: - No need to reserve the bo for MQD (Christian). - Some more changes to support IP specific MQD creation. Cc: Alex Deucher Cc: Christian Koenig Signed-off-by: Shashank Sharma Signed-off-by: Arvind Yadav --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 16 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 73 +++ .../gpu/drm/amd/include/amdgpu_userqueue.h | 7 ++ 3 files changed, 96 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c index e37b5da5a0d0..bb774144c372 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c @@ -134,12 +134,28 @@ int amdgpu_userq_ioctl(struct drm_device *dev, void *data, return r; } +extern const struct amdgpu_userq_funcs userq_gfx_v11_funcs; + +static void +amdgpu_userqueue_setup_gfx(struct amdgpu_userq_mgr *uq_mgr) +{ + int maj; + struct amdgpu_device *adev = uq_mgr->adev; + uint32_t version = adev->ip_versions[GC_HWIP][0]; + + /* We support usermode queue only for GFX V11 as of now */ + maj = IP_VERSION_MAJ(version); + if (maj == 11) + uq_mgr->userq_funcs[AMDGPU_HW_IP_GFX] = _gfx_v11_funcs; +} + int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct amdgpu_device *adev) { mutex_init(_mgr->userq_mutex); idr_init_base(_mgr->userq_idr, 1); userq_mgr->adev = adev; + amdgpu_userqueue_setup_gfx(userq_mgr); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index c4940b6ea1c4..e76e1b86b434 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -30,6 +30,7 @@ #include "amdgpu_psp.h" #include "amdgpu_smu.h" #include "amdgpu_atomfirmware.h" +#include "amdgpu_userqueue.h" #include "imu_v11_0.h" #include "soc21.h" #include "nvd.h" @@ -6486,3 +6487,75 @@ const struct amdgpu_ip_block_version gfx_v11_0_ip_block = .rev = 0, .funcs = _v11_0_ip_funcs, }; + +static int gfx_v11_0_userq_mqd_create(struct amdgpu_userq_mgr *uq_mgr, + struct drm_amdgpu_userq_in *args_in, + struct amdgpu_usermode_queue *queue) +{ + struct amdgpu_device *adev = uq_mgr->adev; + struct amdgpu_mqd *mqd_gfx_generic = >mqds[AMDGPU_HW_IP_GFX]; + struct drm_amdgpu_userq_mqd_gfx_v11_0 mqd_user; + struct amdgpu_mqd_prop userq_props; + int r; + + /* Incoming MQD parameters from userspace to be saved here */ + memset(_user, 0, sizeof(mqd_user)); + + /* Structure to initialize MQD for userqueue using generic MQD init function */ + memset(_props, 0, sizeof(userq_props)); + + if (args_in->mqd_size != sizeof(struct drm_amdgpu_userq_mqd_gfx_v11_0)) { + DRM_ERROR("MQD size mismatch\n"); + return -EINVAL; + } + + if (copy_from_user(_user, u64_to_user_ptr(args_in->mqd), args_in->mqd_size)) { + DRM_ERROR("Failed to get user MQD\n"); + return -EFAULT; + } + + /* Create BO for actual Userqueue MQD now */ + r = amdgpu_bo_create_kernel(adev, mqd_gfx_generic->mqd_size, PAGE_SIZE, + AMDGPU_GEM_DOMAIN_GTT, + >mqd.obj, + >mqd.gpu_addr, + >mqd.cpu_ptr); + if (r) { + DRM_ERROR("Failed to allocate BO for userqueue (%d)", r); + return -ENOMEM; + } Using amdgpu_bo_create_kernel() for the MQD is most likely not a good idea in the long term, but should work for now. I was a bit curious about this, the scope of this MQD object is kernel internal and used for queue mapping only, userspace doesn't know much about it. Do you still think we should not create a kernel object for it ? Well we should use a kernel BO. But amdgpu_bo_create_kernel() not only creates a kernel BO but also pins it! And that is problematic
Re: [PATCH 3/6] drm/amdkfd: switch over to using drm_exec v2
On 2023-07-11 09:31, Christian König wrote: Avoids quite a bit of logic and kmalloc overhead. v2: fix multiple problems pointed out by Felix Signed-off-by: Christian König Two nit-picks inline about DRM_EXEC_INTERRUPTIBLE_WAIT. With those fixed, the patch is Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/Kconfig| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 5 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 299 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 18 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 4 + drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 45 ++- 6 files changed, 162 insertions(+), 210 deletions(-) [snip] @@ -2538,50 +2489,41 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info, */ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info) { - struct amdgpu_bo_list_entry *pd_bo_list_entries; - struct list_head resv_list, duplicates; - struct ww_acquire_ctx ticket; + struct ttm_operation_ctx ctx = { false, false }; struct amdgpu_sync sync; + struct drm_exec exec; struct amdgpu_vm *peer_vm; struct kgd_mem *mem, *tmp_mem; struct amdgpu_bo *bo; - struct ttm_operation_ctx ctx = { false, false }; - int i, ret; - - pd_bo_list_entries = kcalloc(process_info->n_vms, -sizeof(struct amdgpu_bo_list_entry), -GFP_KERNEL); - if (!pd_bo_list_entries) { - pr_err("%s: Failed to allocate PD BO list entries\n", __func__); - ret = -ENOMEM; - goto out_no_mem; - } - - INIT_LIST_HEAD(_list); - INIT_LIST_HEAD(); + int ret; - /* Get all the page directory BOs that need to be reserved */ - i = 0; - list_for_each_entry(peer_vm, _info->vm_list_head, - vm_list_node) - amdgpu_vm_get_pd_bo(peer_vm, _list, - _bo_list_entries[i++]); - /* Add the userptr_inval_list entries to resv_list */ - list_for_each_entry(mem, _info->userptr_inval_list, - validate_list.head) { - list_add_tail(>resv_list.head, _list); - mem->resv_list.bo = mem->validate_list.bo; - mem->resv_list.num_shared = mem->validate_list.num_shared; - } + amdgpu_sync_create(); + drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT); This runs in a worker thread. So I think it doesn't need to be interruptible. /* Reserve all BOs and page tables for validation */ - ret = ttm_eu_reserve_buffers(, _list, false, ); - WARN(!list_empty(), "Duplicates should be empty"); - if (ret) - goto out_free; + drm_exec_until_all_locked() { + /* Reserve all the page directories */ + list_for_each_entry(peer_vm, _info->vm_list_head, + vm_list_node) { + ret = amdgpu_vm_lock_pd(peer_vm, , 2); + drm_exec_retry_on_contention(); + if (unlikely(ret)) + goto unreserve_out; + } - amdgpu_sync_create(); + /* Reserve the userptr_inval_list entries to resv_list */ + list_for_each_entry(mem, _info->userptr_inval_list, + validate_list) { + struct drm_gem_object *gobj; + + gobj = >bo->tbo.base; + ret = drm_exec_prepare_obj(, gobj, 1); + drm_exec_retry_on_contention(); + if (unlikely(ret)) + goto unreserve_out; + } + } ret = process_validate_vms(process_info); if (ret) [snip] @@ -1467,25 +1467,24 @@ static int svm_range_reserve_bos(struct svm_validate_context *ctx) uint32_t gpuidx; int r; - INIT_LIST_HEAD(>validate_list); - for_each_set_bit(gpuidx, ctx->bitmap, MAX_GPU_INSTANCE) { - pdd = kfd_process_device_from_gpuidx(ctx->process, gpuidx); - if (!pdd) { - pr_debug("failed to find device idx %d\n", gpuidx); - return -EINVAL; - } - vm = drm_priv_to_vm(pdd->drm_priv); - - ctx->tv[gpuidx].bo = >root.bo->tbo; - ctx->tv[gpuidx].num_shared = 4; - list_add(>tv[gpuidx].head, >validate_list); - } + drm_exec_init(>exec, DRM_EXEC_INTERRUPTIBLE_WAIT); This function is only called from svm_range_validate_and_map, which has an "intr" parameter. If you pass that through, you could make D
Re: [PATCH] drm/amdkfd: enable grace period for xcp instance
On 2023-07-11 10:28, Eric Huang wrote: Read/write grace period from/to first xcc instance of xcp in kfd node. Signed-off-by: Eric Huang --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 21 --- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 8 --- 3 files changed, 20 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 31cac1fd0d58..9000c4b778fd 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -1619,10 +1619,14 @@ static int initialize_cpsch(struct device_queue_manager *dqm) init_sdma_bitmaps(dqm); - if (dqm->dev->kfd2kgd->get_iq_wait_times) + if (dqm->dev->kfd2kgd->get_iq_wait_times) { + u32 first_inst = dqm->dev->xcp->id * +dqm->dev->adev->gfx.num_xcc_per_xcp; dqm->dev->kfd2kgd->get_iq_wait_times(dqm->dev->adev, - >wait_times, - ffs(dqm->dev->xcc_mask) - 1); + >wait_times[first_inst], + first_inst); + } + return 0; } @@ -1675,13 +1679,16 @@ static int start_cpsch(struct device_queue_manager *dqm) grace_period); if (retval) pr_err("Setting grace timeout failed\n"); - else if (dqm->dev->kfd2kgd->build_grace_period_packet_info) + else if (dqm->dev->kfd2kgd->build_grace_period_packet_info) { + u32 first_inst = dqm->dev->xcp->id * +dqm->dev->adev->gfx.num_xcc_per_xcp; /* Update dqm->wait_times maintained in software */ dqm->dev->kfd2kgd->build_grace_period_packet_info( - dqm->dev->adev, dqm->wait_times, + dqm->dev->adev, dqm->wait_times[first_inst], grace_period, _offset, - >wait_times, - ffs(dqm->dev->xcc_mask) - 1); + >wait_times[first_inst], + first_inst); + } } dqm_unlock(dqm); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index 7dd4b177219d..45959c33b944 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -262,7 +262,7 @@ struct device_queue_manager { /* used for GFX 9.4.3 only */ uint32_tcurrent_logical_xcc_start; - uint32_t wait_times; + uint32_twait_times[MAX_XCP]; Why do you need an array here, if it only saves the wait times in one of the array entries [first_inst]? Regards, Felix wait_queue_head_t destroy_wait; }; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c index 8fda16e6fee6..960404a6379b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c @@ -292,17 +292,19 @@ static int pm_set_grace_period_v9(struct packet_manager *pm, struct pm4_mec_write_data_mmio *packet; uint32_t reg_offset = 0; uint32_t reg_data = 0; + uint32_t first_inst = pm->dqm->dev->xcp->id * + pm->dqm->dev->adev->gfx.num_xcc_per_xcp; pm->dqm->dev->kfd2kgd->build_grace_period_packet_info( pm->dqm->dev->adev, - pm->dqm->wait_times, + pm->dqm->wait_times[first_inst], grace_period, _offset, _data, - 0); + first_inst); if (grace_period == USE_DEFAULT_GRACE_PERIOD) - reg_data = pm->dqm->wait_times; + reg_data = pm->dqm->wait_times[first_inst]; packet = (struct pm4_mec_write_data_mmio *)buffer; memset(buffer, 0, sizeof(struct pm4_mec_write_data_mmio));
Re: [PATCH] drm/amdkfd: report dispatch id always saved in ttmps after gc9.4.2
On 2023-07-11 13:19, Jonathan Kim wrote: The feature to save the dispatch ID in trap temporaries 6 & 7 on context save is unconditionally enabled during MQD initialization. Now that TTMPs are always setup regardless of debug mode for GC 9.4.3, we should report that the dispatch ID is always available for debug/trap handling. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 1a4cdee86759..eeedc3ddffeb 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1941,10 +1941,11 @@ static void kfd_topology_set_capabilities(struct kfd_topology_device *dev) HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX9 | HSA_DBG_WATCH_ADDR_MASK_HI_BIT; - if (KFD_GC_VERSION(dev->gpu) < IP_VERSION(9, 4, 2)) + if (KFD_GC_VERSION(dev->gpu) != IP_VERSION(9, 4, 2)) dev->node_props.debug_prop |= HSA_DBG_DISPATCH_INFO_ALWAYS_VALID; - else + + if (KFD_GC_VERSION(dev->gpu) >= IP_VERSION(9, 4, 2)) dev->node_props.capability |= HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED; } else {
Re: [PATCH v2] drm/amdgpu: Increase soft IH ring size
On 2023-07-07 11:49, Philip Yang wrote: Retry faults are delegated to soft IH ring and then processed by deferred worker. Current soft IH ring size PAGE_SIZE can store 128 entries, which may overflow and drop retry faults, causes HW stucks because the retry fault is not recovered. Increase soft IH ring size to 8KB, enough to store 256 CAM entries because we clear the CAM entry after handling the retry fault from soft ring. Define macro IH_RING_SIZE and IH_SW_RING_SIZE to remove duplicate constant. Show warning message if soft IH ring overflows because this should not happen. It would indicate a problem with the CAM or it could happen on older GPUs that don't have a CAM. See below. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 8 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 7 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +- drivers/gpu/drm/amd/amdgpu/ih_v6_0.c| 4 ++-- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 4 ++-- 7 files changed, 20 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c index fceb3b384955..51a0dbd2358a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c @@ -138,6 +138,7 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) /** * amdgpu_ih_ring_write - write IV to the ring buffer * + * @adev: amdgpu_device pointer * @ih: ih ring to write to * @iv: the iv to write * @num_dw: size of the iv in dw @@ -145,8 +146,8 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) * Writes an IV to the ring buffer using the CPU and increment the wptr. * Used for testing and delegating IVs to a software ring. */ -void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv, - unsigned int num_dw) +void amdgpu_ih_ring_write(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, + const uint32_t *iv, unsigned int num_dw) { uint32_t wptr = le32_to_cpu(*ih->wptr_cpu) >> 2; unsigned int i; @@ -161,6 +162,9 @@ void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv, if (wptr != READ_ONCE(ih->rptr)) { wmb(); WRITE_ONCE(*ih->wptr_cpu, cpu_to_le32(wptr)); + } else { + dev_warn(adev->dev, "IH soft ring buffer overflow 0x%X, 0x%X\n", +wptr, ih->rptr); If this happens, it's probably going to flood the log. It would be a good idea to apply a rate-limit, or use dev_warn_once. With that fixed, the patch is Reviewed-by: Felix Kuehling } } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h index dd1c2eded6b9..6c6184f0dbc1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h @@ -27,6 +27,9 @@ /* Maximum number of IVs processed at once */ #define AMDGPU_IH_MAX_NUM_IVS 32 +#define IH_RING_SIZE (256 * 1024) +#define IH_SW_RING_SIZE(8 * 1024) /* enough for 256 CAM entries */ + struct amdgpu_device; struct amdgpu_iv_entry; @@ -97,8 +100,8 @@ struct amdgpu_ih_funcs { int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, unsigned ring_size, bool use_bus_addr); void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih); -void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv, - unsigned int num_dw); +void amdgpu_ih_ring_write(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, + const uint32_t *iv, unsigned int num_dw); int amdgpu_ih_wait_on_checkpoint_process_ts(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih); int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 5273decc5753..fa6d0adcec20 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -493,7 +493,7 @@ void amdgpu_irq_delegate(struct amdgpu_device *adev, struct amdgpu_iv_entry *entry, unsigned int num_dw) { - amdgpu_ih_ring_write(>irq.ih_soft, entry->iv_entry, num_dw); + amdgpu_ih_ring_write(adev, >irq.ih_soft, entry->iv_entry, num_dw); schedule_work(>irq.ih_soft_work); } diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c index b02e1cef78a7..980b24120080 100644 --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c @@ -535,7 +53
Re: [PATCH] drm/amdgpu: Increase IH soft ring size
Am 2023-07-07 um 10:14 schrieb Philip Yang: Retry faults are delegated to IH soft ring and then processed by deferred worker. Current IH soft ring size PAGE_SIZE can store 128 entries, which may overflow and drop retry faults, causes HW stucks because the retry fault is not recovered. Increase IH soft ring size to the same size as IH ring, define macro IH_RING_SIZE to remove duplicate constant. As discussed offline, dropping retry fault interrupts is only a problem when the CAM is enabled. You only need as many entries in the soft IH ring as there are entries in the CAM. Regards, Felix Show warning message if IH soft ring overflows because this should not happen any more. Signed-off-by: Philip Yang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 8 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +- drivers/gpu/drm/amd/amdgpu/ih_v6_0.c| 5 +++-- drivers/gpu/drm/amd/amdgpu/navi10_ih.c | 5 +++-- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 5 +++-- drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 5 +++-- 7 files changed, 21 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c index fceb3b384955..51a0dbd2358a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c @@ -138,6 +138,7 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) /** * amdgpu_ih_ring_write - write IV to the ring buffer * + * @adev: amdgpu_device pointer * @ih: ih ring to write to * @iv: the iv to write * @num_dw: size of the iv in dw @@ -145,8 +146,8 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) * Writes an IV to the ring buffer using the CPU and increment the wptr. * Used for testing and delegating IVs to a software ring. */ -void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv, - unsigned int num_dw) +void amdgpu_ih_ring_write(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, + const uint32_t *iv, unsigned int num_dw) { uint32_t wptr = le32_to_cpu(*ih->wptr_cpu) >> 2; unsigned int i; @@ -161,6 +162,9 @@ void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv, if (wptr != READ_ONCE(ih->rptr)) { wmb(); WRITE_ONCE(*ih->wptr_cpu, cpu_to_le32(wptr)); + } else { + dev_warn(adev->dev, "IH soft ring buffer overflow 0x%X, 0x%X\n", +wptr, ih->rptr); } } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h index dd1c2eded6b9..a8cf67f1f011 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h @@ -97,8 +97,8 @@ struct amdgpu_ih_funcs { int amdgpu_ih_ring_init(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, unsigned ring_size, bool use_bus_addr); void amdgpu_ih_ring_fini(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih); -void amdgpu_ih_ring_write(struct amdgpu_ih_ring *ih, const uint32_t *iv, - unsigned int num_dw); +void amdgpu_ih_ring_write(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih, + const uint32_t *iv, unsigned int num_dw); int amdgpu_ih_wait_on_checkpoint_process_ts(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih); int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 5273decc5753..fa6d0adcec20 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -493,7 +493,7 @@ void amdgpu_irq_delegate(struct amdgpu_device *adev, struct amdgpu_iv_entry *entry, unsigned int num_dw) { - amdgpu_ih_ring_write(>irq.ih_soft, entry->iv_entry, num_dw); + amdgpu_ih_ring_write(adev, >irq.ih_soft, entry->iv_entry, num_dw); schedule_work(>irq.ih_soft_work); } diff --git a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c index b02e1cef78a7..21d2e57cffe7 100644 --- a/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c +++ b/drivers/gpu/drm/amd/amdgpu/ih_v6_0.c @@ -32,6 +32,7 @@ #include "soc15_common.h" #include "ih_v6_0.h" +#define IH_RING_SIZE (256 * 1024) #define MAX_REARM_RETRY 10 static void ih_v6_0_set_interrupt_funcs(struct amdgpu_device *adev); @@ -535,7 +536,7 @@ static int ih_v6_0_sw_init(void *handle) * use bus address for ih ring by psp bl */ use_bus_addr = (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) ? false : true; - r = amdgpu_ih_ring_init(adev, >irq.ih, 256 * 1024, use_bus_addr); + r = amdgpu_ih_ring_init(adev,
Re: [PATCH] drm/amdkfd: Access gpuvm_export_dmabuf() api
Am 2023-06-20 um 22:11 schrieb Ramesh Errabolu: Call KFD api to get Dmabuf instead of calling GEM Prime API Signed-off-by: Ramesh Errabolu --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index cf1db0ab3471..c37d82b35372 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -1852,13 +1852,13 @@ static uint32_t get_process_num_bos(struct kfd_process *p) return num_of_bos; } -static int criu_get_prime_handle(struct drm_gem_object *gobj, int flags, +static int criu_get_prime_handle(struct kgd_mem *mem, int flags, u32 *shared_fd) { struct dma_buf *dmabuf; int ret; - dmabuf = amdgpu_gem_prime_export(gobj, flags); + ret = amdgpu_amdkfd_gpuvm_export_dmabuf(mem, ); if (IS_ERR(dmabuf)) { I think you need to check ret here instead of IS_ERR(dmabuf). Please also check with Rajneesh. I think he ran into this before and I discussed this fix with him. Otherwise the patch looks reasonable to me. Thanks, Felix ret = PTR_ERR(dmabuf); pr_err("dmabuf export failed for the BO\n"); @@ -1940,7 +1940,7 @@ static int criu_checkpoint_bos(struct kfd_process *p, } if (bo_bucket->alloc_flags & (KFD_IOC_ALLOC_MEM_FLAGS_VRAM | KFD_IOC_ALLOC_MEM_FLAGS_GTT)) { - ret = criu_get_prime_handle(_bo->tbo.base, + ret = criu_get_prime_handle(kgd_mem, bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ? DRM_RDWR : 0, _bucket->dmabuf_fd); @@ -2402,7 +2402,7 @@ static int criu_restore_bo(struct kfd_process *p, /* create the dmabuf object and export the bo */ if (bo_bucket->alloc_flags & (KFD_IOC_ALLOC_MEM_FLAGS_VRAM | KFD_IOC_ALLOC_MEM_FLAGS_GTT)) { - ret = criu_get_prime_handle(_mem->bo->tbo.base, DRM_RDWR, + ret = criu_get_prime_handle(kgd_mem, DRM_RDWR, _bucket->dmabuf_fd); if (ret) return ret;
Re: [PATCH] drm/amdgpu: Forbid kfd using cpu to update pt if vm is shared with gfx
Can we change the flags if needed. E.g. see what amdgpu_bo_pin_restricted does: if (!(bo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS)) bo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; amdgpu_bo_placement_from_domain(bo, domain); This shouldn't really change anything about the BO placement because we only enable CPU page table updates on large-BAR GPUs by default. Alternatively, we could create VM BOs with AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED on large-BAR GPUs to make it possible to switch to CPU page table updates for compute VMs. Regards, Felix Am 2023-06-21 um 05:46 schrieb YuBiao Wang: If a same GPU VM is shared by kfd and graphic operations, we must align the vm update mode to sdma, or cpu kmap will fail and cause null pointer issue. Signed-off-by: YuBiao Wang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 291977b93b1d..e105ff9e8041 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2239,6 +2239,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm) int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm) { bool pte_support_ats = (adev->asic_type == CHIP_RAVEN); + struct amdgpu_bo *bo = vm->root.bo; int r; r = amdgpu_bo_reserve(vm->root.bo, true); @@ -2265,6 +2266,10 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm) /* Update VM state */ vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode & AMDGPU_VM_USE_CPU_FOR_COMPUTE); + + if (bo && !(bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) + vm->use_cpu_for_update = false; + DRM_DEBUG_DRIVER("VM update mode is %s\n", vm->use_cpu_for_update ? "CPU" : "SDMA"); WARN_ONCE((vm->use_cpu_for_update &&
Re: [PATCHv4] drm/amdgpu: Update invalid PTE flag setting
On 2023-06-19 13:38, Mukul Joshi wrote: Update the invalid PTE flag setting with TF enabled. This is to ensure, in addition to transitioning the retry fault to a no-retry fault, it also causes the wavefront to enter the trap handler. With the current setting, the fault only transitions to a no-retry fault. Additionally, have 2 sets of invalid PTE settings, one for TF enabled, the other for TF disabled. The setting with TF disabled, doesn't work with TF enabled. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling --- v1->v2: - Update handling according to Christian's feedback. v2->v3: - Remove ASIC specific callback (Felix). v3->v4: - Add noretry flag to amdgpu->gmc. This allows to set ASIC specific flags. drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 6 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 31 +++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c| 1 + drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c| 1 + drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 1 + drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 + 9 files changed, 45 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index 56d73fade568..fdc25cd559b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -331,6 +331,8 @@ struct amdgpu_gmc { u64 VM_CONTEXT_PAGE_TABLE_END_ADDR_LO32[16]; u64 VM_CONTEXT_PAGE_TABLE_END_ADDR_HI32[16]; u64 MC_VM_MX_L1_TLB_CNTL; + + u64 noretry_flags; }; #define amdgpu_gmc_flush_gpu_tlb(adev, vmid, vmhub, type) ((adev)->gmc.gmc_funcs->flush_gpu_tlb((adev), (vmid), (vmhub), (type))) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index eff73c428b12..8c7861a4d75d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2604,7 +2604,7 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid, /* Intentionally setting invalid PTE flag * combination to force a no-retry-fault */ - flags = AMDGPU_PTE_SNOOPED | AMDGPU_PTE_PRT; + flags = AMDGPU_VM_NORETRY_FLAGS; value = 0; } else if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) { /* Redirect the access to the dummy page */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 9c85d494f2a2..b81fcb962d8f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -84,7 +84,13 @@ struct amdgpu_mem_stats; /* PDE Block Fragment Size for VEGA10 */ #define AMDGPU_PDE_BFS(a) ((uint64_t)a << 59) +/* Flag combination to set no-retry with TF disabled */ +#define AMDGPU_VM_NORETRY_FLAGS(AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE | \ + AMDGPU_PTE_TF) +/* Flag combination to set no-retry with TF enabled */ +#define AMDGPU_VM_NORETRY_FLAGS_TF (AMDGPU_PTE_VALID | AMDGPU_PTE_SYSTEM | \ + AMDGPU_PTE_PRT) /* For GFX9 */ #define AMDGPU_PTE_MTYPE_VG10(a) ((uint64_t)(a) << 57) #define AMDGPU_PTE_MTYPE_VG10_MASKAMDGPU_PTE_MTYPE_VG10(3ULL) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index dea1a64be44d..24ddf6a0512a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -778,6 +778,27 @@ int amdgpu_vm_pde_update(struct amdgpu_vm_update_params *params, 1, 0, flags); } +/** + * amdgpu_vm_pte_update_noretry_flags - Update PTE no-retry flags + * + * @adev - amdgpu_device pointer + * @flags: pointer to PTE flags + * + * Update PTE no-retry flags when TF is enabled. + */ +static void amdgpu_vm_pte_update_noretry_flags(struct amdgpu_device *adev, + uint64_t *flags) +{ + /* +* Update no-retry flags with the corresponding TF +* no-retry combination. +*/ + if ((*flags & AMDGPU_VM_NORETRY_FLAGS) == AMDGPU_VM_NORETRY_FLAGS) { + *flags &= ~AMDGPU_VM_NORETRY_FLAGS; + *flags |= adev->gmc.noretry_flags; + } +} + /* * amdgpu_vm_pte_update_flags - figure out flags for PTE updates * @@ -804,6 +825,16 @@ static void amdgpu_vm_pte_update_flags(struct amdgpu_vm_update_params *params, flags |= AMDGPU_PTE_EXECUTABLE; } + /* +* Update no-retry flags to use the no-retry flag combination +* with TF enabled. The AMDGPU_VM_NORETRY_FLAGS flag combination +* does not work when TF is enabled. So, replace them w
Re: [PATCH] drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute
On 2023-06-19 17:28, Xiaogang.Chen wrote: From: Xiaogang Chen Since we allow kfd and graphic operate on same GPU VM to have interoperation between them GPU VM may have been used by graphic vm operations before kfd turns a GPU VM into a compute VM. Remove vm clean checking at amdgpu_vm_make_compute. Signed-off-by: Xiaogang Chen Reviewed-by: Felix Kuehling As discussed, we can follow this up with a change that enables ATS for graphics VMs as well, so we don't need to enable ATS in amdgpu_vm_make_compute. This would improve interop for Raven. We only enable ATS for the lower half of the address space, so it should not affect graphics client that use the upper half. Thanks, Felix --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index eff73c428b12..291977b93b1d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2245,16 +2245,16 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm) if (r) return r; - /* Sanity checks */ - if (!amdgpu_vm_pt_is_root_clean(adev, vm)) { - r = -EINVAL; - goto unreserve_bo; - } - /* Check if PD needs to be reinitialized and do it before * changing any other state, in case it fails. */ if (pte_support_ats != vm->pte_support_ats) { + /* Sanity checks */ + if (!amdgpu_vm_pt_is_root_clean(adev, vm)) { + r = -EINVAL; + goto unreserve_bo; + } + vm->pte_support_ats = pte_support_ats; r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo), false);
Re: [PATCH] drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute
On 2023-06-19 15:06, Xiaogang.Chen wrote: From: Xiaogang Chen Since we allow kfd and graphic operate on same GPU VM to have interoperation between them GPU VM may have been used by graphic vm operations before kfd turn a GFX VM into a compute VM. Remove vm clean checking at amdgpu_vm_make_compute. Signed-off-by: Xiaogang Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index eff73c428b12..33f05297ab7e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2246,7 +2246,7 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm) return r; /* Sanity checks */ - if (!amdgpu_vm_pt_is_root_clean(adev, vm)) { + if (pte_support_ats && !amdgpu_vm_pt_is_root_clean(adev, vm)) { I think the correct condition here would be "pte_support_ats != vm->pte_support_ats", because that's what's used to reinitialize the page table just below. I think it would be even cleaner if you moved that check inside the "if (pte_support_ats != vm->pte_support_ats)" block below. Regards, Felix r = -EINVAL; goto unreserve_bo; }
Re: [PATCHv2] drm/amdkfd: Enable GWS on GFX9.4.3
On 2023-06-16 14:44, Mukul Joshi wrote: Enable GWS capable queue creation for forward progress gaurantee on GFX 9.4.3. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling --- v1->v2: - Update the condition for setting pqn->q->gws for GFX 9.4.3. drivers/gpu/drm/amd/amdkfd/kfd_device.c | 1 + .../amd/amdkfd/kfd_process_queue_manager.c| 35 --- 2 files changed, 24 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 9d4abfd8b55e..226d2dd7fa49 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -518,6 +518,7 @@ static int kfd_gws_init(struct kfd_node *node) && kfd->mec2_fw_version >= 0x30) || (KFD_GC_VERSION(node) == IP_VERSION(9, 4, 2) && kfd->mec2_fw_version >= 0x28) || + (KFD_GC_VERSION(node) == IP_VERSION(9, 4, 3)) || (KFD_GC_VERSION(node) >= IP_VERSION(10, 3, 0) && KFD_GC_VERSION(node) < IP_VERSION(11, 0, 0) && kfd->mec2_fw_version >= 0x6b diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c index 9ad1a2186a24..ba9d69054119 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c @@ -123,16 +123,24 @@ int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid, if (!gws && pdd->qpd.num_gws == 0) return -EINVAL; - if (gws) - ret = amdgpu_amdkfd_add_gws_to_process(pdd->process->kgd_process_info, - gws, ); - else - ret = amdgpu_amdkfd_remove_gws_from_process(pdd->process->kgd_process_info, - pqn->q->gws); - if (unlikely(ret)) - return ret; + if (KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 3)) { + if (gws) + ret = amdgpu_amdkfd_add_gws_to_process(pdd->process->kgd_process_info, + gws, ); + else + ret = amdgpu_amdkfd_remove_gws_from_process(pdd->process->kgd_process_info, + pqn->q->gws); + if (unlikely(ret)) + return ret; + pqn->q->gws = mem; + } else { + /* +* Intentionally set GWS to a non-NULL value +* for GFX 9.4.3. +*/ + pqn->q->gws = gws ? ERR_PTR(-ENOMEM) : NULL; + } - pqn->q->gws = mem; pdd->qpd.num_gws = gws ? dev->adev->gds.gws_size : 0; return pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm, @@ -164,7 +172,8 @@ void pqm_uninit(struct process_queue_manager *pqm) struct process_queue_node *pqn, *next; list_for_each_entry_safe(pqn, next, >queues, process_queue_list) { - if (pqn->q && pqn->q->gws) + if (pqn->q && pqn->q->gws && + KFD_GC_VERSION(pqn->q->device) != IP_VERSION(9, 4, 3)) amdgpu_amdkfd_remove_gws_from_process(pqm->process->kgd_process_info, pqn->q->gws); kfd_procfs_del_queue(pqn->q); @@ -446,8 +455,10 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid) } if (pqn->q->gws) { - amdgpu_amdkfd_remove_gws_from_process(pqm->process->kgd_process_info, - pqn->q->gws); + if (KFD_GC_VERSION(pqn->q->device) != IP_VERSION(9, 4, 3)) + amdgpu_amdkfd_remove_gws_from_process( + pqm->process->kgd_process_info, + pqn->q->gws); pdd->qpd.num_gws = 0; }
Re: [PATCH] drm/amdkfd: Use KIQ to unmap HIQ
On 2023-06-16 14:00, Mukul Joshi wrote: Currently, we unmap HIQ by directly writing to HQD registers. This doesn't work for GFX9.4.3. Instead, use KIQ to unmap HIQ, similar to how we use KIQ to map HIQ. Using KIQ to unmap HIQ works for all GFX series post GFXv9. Signed-off-by: Mukul Joshi --- .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 1 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 47 ++ .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h| 3 ++ .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10_3.c | 1 + .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 47 ++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 48 +++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 3 ++ drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 8 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h | 4 ++ .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c | 2 +- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 2 +- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 7 ++- .../gpu/drm/amd/include/kgd_kfd_interface.h | 3 ++ 13 files changed, 170 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c index 5b4b7f8b92a5..b82435e17ed0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c @@ -372,6 +372,7 @@ const struct kfd2kgd_calls gc_9_4_3_kfd2kgd = { .hqd_sdma_dump = kgd_gfx_v9_4_3_hqd_sdma_dump, .hqd_is_occupied = kgd_gfx_v9_hqd_is_occupied, .hqd_sdma_is_occupied = kgd_gfx_v9_4_3_hqd_sdma_is_occupied, + .hiq_hqd_destroy = kgd_gfx_v9_hiq_hqd_destroy, .hqd_destroy = kgd_gfx_v9_hqd_destroy, .hqd_sdma_destroy = kgd_gfx_v9_4_3_hqd_sdma_destroy, .wave_control_execute = kgd_gfx_v9_wave_control_execute, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c index 8ad7a7779e14..a919fb8e09a0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c @@ -510,6 +510,52 @@ static bool kgd_hqd_sdma_is_occupied(struct amdgpu_device *adev, void *mqd) return false; } +int kgd_gfx_v10_hiq_hqd_destroy(struct amdgpu_device *adev, void *mqd, + uint32_t pipe_id, uint32_t queue_id, + uint32_t inst) +{ + struct amdgpu_ring *kiq_ring = >gfx.kiq[0].ring; + struct v10_compute_mqd *m = get_mqd(mqd); + uint32_t mec, pipe; + uint32_t doorbell_off; + int r; + + doorbell_off = m->cp_hqd_pq_doorbell_control >> + CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT; + + acquire_queue(adev, pipe_id, queue_id); + + mec = (pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1; + pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec); + + spin_lock(>gfx.kiq[0].ring_lock); + r = amdgpu_ring_alloc(kiq_ring, 6); + if (r) { + pr_err("Failed to alloc KIQ (%d).\n", r); + goto out_unlock; + } + + amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_UNMAP_QUEUES, 4)); + amdgpu_ring_write(kiq_ring, /* Q_sel: 0, vmid: 0, engine: 0, num_Q: 1 */ + PACKET3_UNMAP_QUEUES_ACTION(0) | + PACKET3_UNMAP_QUEUES_QUEUE_SEL(0) | + PACKET3_UNMAP_QUEUES_ENGINE_SEL(0) | + PACKET3_UNMAP_QUEUES_NUM_QUEUES(1)); + amdgpu_ring_write(kiq_ring, + PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(doorbell_off)); + amdgpu_ring_write(kiq_ring, 0); + amdgpu_ring_write(kiq_ring, 0); + amdgpu_ring_write(kiq_ring, 0); This looks like you're duplicating the functionality in kiq->pmf->kiq_unmap_queues. Can we just call that instead? See amdgpu_gfx_disable_kcq for example. Regards, Felix + + amdgpu_ring_commit(kiq_ring); + +out_unlock: + spin_unlock(>gfx.kiq[0].ring_lock); + release_queue(adev); + + return r; +} + static int kgd_hqd_destroy(struct amdgpu_device *adev, void *mqd, enum kfd_preempt_type reset_type, unsigned int utimeout, uint32_t pipe_id, @@ -1034,6 +1080,7 @@ const struct kfd2kgd_calls gfx_v10_kfd2kgd = { .hqd_sdma_dump = kgd_hqd_sdma_dump, .hqd_is_occupied = kgd_hqd_is_occupied, .hqd_sdma_is_occupied = kgd_hqd_sdma_is_occupied, + .hiq_hqd_destroy = kgd_gfx_v10_hiq_hqd_destroy, .hqd_destroy = kgd_hqd_destroy, .hqd_sdma_destroy = kgd_hqd_sdma_destroy, .wave_control_execute = kgd_wave_control_execute, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h index e6b70196071a..00b4514ebdd5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.h +++
Re: [PATCH] drm/amdkfd: Enable GWS on GFX9.4.3
On 2023-06-16 13:59, Mukul Joshi wrote: Enable GWS capable queue creation for forward progress gaurantee on GFX 9.4.3. Signed-off-by: Mukul Joshi --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 1 + .../amd/amdkfd/kfd_process_queue_manager.c| 31 --- 2 files changed, 20 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 9d4abfd8b55e..226d2dd7fa49 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -518,6 +518,7 @@ static int kfd_gws_init(struct kfd_node *node) && kfd->mec2_fw_version >= 0x30) || (KFD_GC_VERSION(node) == IP_VERSION(9, 4, 2) && kfd->mec2_fw_version >= 0x28) || + (KFD_GC_VERSION(node) == IP_VERSION(9, 4, 3)) || (KFD_GC_VERSION(node) >= IP_VERSION(10, 3, 0) && KFD_GC_VERSION(node) < IP_VERSION(11, 0, 0) && kfd->mec2_fw_version >= 0x6b diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c index 9ad1a2186a24..9a091d8f9aaf 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c @@ -123,16 +123,20 @@ int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid, if (!gws && pdd->qpd.num_gws == 0) return -EINVAL; - if (gws) - ret = amdgpu_amdkfd_add_gws_to_process(pdd->process->kgd_process_info, - gws, ); - else - ret = amdgpu_amdkfd_remove_gws_from_process(pdd->process->kgd_process_info, - pqn->q->gws); - if (unlikely(ret)) - return ret; + if (KFD_GC_VERSION(dev) != IP_VERSION(9, 4, 3)) { + if (gws) + ret = amdgpu_amdkfd_add_gws_to_process(pdd->process->kgd_process_info, + gws, ); + else + ret = amdgpu_amdkfd_remove_gws_from_process(pdd->process->kgd_process_info, + pqn->q->gws); + if (unlikely(ret)) + return ret; + pqn->q->gws = mem; + } else { + pqn->q->gws = ERR_PTR(-ENOMEM); I think this needs to be pqn->q->gws = gws ? ERR_PTR(-ENOMEM) : NULL; Regards, Felix + } - pqn->q->gws = mem; pdd->qpd.num_gws = gws ? dev->adev->gds.gws_size : 0; return pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm, @@ -164,7 +168,8 @@ void pqm_uninit(struct process_queue_manager *pqm) struct process_queue_node *pqn, *next; list_for_each_entry_safe(pqn, next, >queues, process_queue_list) { - if (pqn->q && pqn->q->gws) + if (pqn->q && pqn->q->gws && + KFD_GC_VERSION(pqn->q->device) != IP_VERSION(9, 4, 3)) amdgpu_amdkfd_remove_gws_from_process(pqm->process->kgd_process_info, pqn->q->gws); kfd_procfs_del_queue(pqn->q); @@ -446,8 +451,10 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid) } if (pqn->q->gws) { - amdgpu_amdkfd_remove_gws_from_process(pqm->process->kgd_process_info, - pqn->q->gws); + if (KFD_GC_VERSION(pqn->q->device) != IP_VERSION(9, 4, 3)) + amdgpu_amdkfd_remove_gws_from_process( + pqm->process->kgd_process_info, + pqn->q->gws); pdd->qpd.num_gws = 0; }
Re: [PATCH] drm/amdgpu: Modify for_each_inst macro
Am 2023-06-16 um 06:23 schrieb Lijo Lazar: Modify it such that it doesn't change the instance mask parameter. Signed-off-by: Lijo Lazar Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index f4029c13a9be..c5451a9b0ee4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1295,9 +1295,9 @@ int emu_soc_asic_init(struct amdgpu_device *adev); #define amdgpu_inc_vram_lost(adev) atomic_inc(&((adev)->vram_lost_counter)); -#define for_each_inst(i, inst_mask)\ - for (i = ffs(inst_mask) - 1; inst_mask;\ -inst_mask &= ~(1U << i), i = ffs(inst_mask) - 1) +#define for_each_inst(i, inst_mask)\ + for (i = ffs(inst_mask); i-- != 0; \ +i = ffs((inst_mask & (~0U << (i + 1) #define MIN(X, Y) ((X) < (Y) ? (X) : (Y))
Re: [PATCH] drm/amdkfd: set coherent host access capability flag
Am 2023-06-16 um 00:29 schrieb Felix Kuehling: Am 2023-06-15 um 18:54 schrieb Alex Sierra: This flag determines whether the host possesses coherent access to the memory of the device. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 90b86a6ac7bd..7ede3de4f7fb 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -2107,6 +2107,10 @@ int kfd_topology_add_device(struct kfd_node *gpu) if (KFD_IS_SVM_API_SUPPORTED(dev->gpu->adev)) dev->node_props.capability |= HSA_CAP_SVMAPI_SUPPORTED; + if (dev->gpu->adev->gmc.is_app_apu | + dev->gpu->adev->gmc.xgmi.connected_to_cpu) + dev->node_props.capability |= HSA_CAP_FLAGS_COHERENTHOSTACCESS; I believe this is not true for "small APUs" because they map the framebuffer as WC on the CPU. I think you need to check specifically for APP APU. Never mind, I read it wrong. You are checking the correct APP APU flag. Just one more nit-pick, in the condition you should use logical OR (a || b), not bit-wise OR (a | b). With that fixed, the patch is Reviewed-by: Felix Kuehling Regards, Felix + kfd_debug_print_topology(); kfd_notify_gpu_change(gpu_id, 1);
Re: [PATCH] drm/amdkfd: set coherent host access capability flag
Am 2023-06-15 um 18:54 schrieb Alex Sierra: This flag determines whether the host possesses coherent access to the memory of the device. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 90b86a6ac7bd..7ede3de4f7fb 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -2107,6 +2107,10 @@ int kfd_topology_add_device(struct kfd_node *gpu) if (KFD_IS_SVM_API_SUPPORTED(dev->gpu->adev)) dev->node_props.capability |= HSA_CAP_SVMAPI_SUPPORTED; + if (dev->gpu->adev->gmc.is_app_apu | + dev->gpu->adev->gmc.xgmi.connected_to_cpu) + dev->node_props.capability |= HSA_CAP_FLAGS_COHERENTHOSTACCESS; I believe this is not true for "small APUs" because they map the framebuffer as WC on the CPU. I think you need to check specifically for APP APU. Regards, Felix + kfd_debug_print_topology(); kfd_notify_gpu_change(gpu_id, 1);
Re: [PATCH 2/3] drm/amdgpu: Implement new dummy vram manager
Am 2023-06-15 um 03:37 schrieb Christian König: Am 14.06.23 um 17:42 schrieb Felix Kuehling: Am 2023-06-14 um 06:38 schrieb Christian König: Am 10.05.23 um 00:01 schrieb Alex Deucher: From: Rajneesh Bhardwaj This adds dummy vram manager to support ASICs that do not have a dedicated or carvedout vram domain. Well that doesn't seem to make much sense. Why we should have that? TTM always expects a resource manager for VRAM. There are no NULL pointer checks in TTM for not having a resource manager for VRAM. The existing amdgpu_vram_mgr gets confused if there is no VRAM. It seemed cleaner to add a dummy manager than to scatter conditions for a memory-less GPU corner case through the regular VRAM manager. Well no that's absolutely *not* cleaner. TTM has a predefined manager if you need to use a dummy. I think you are referring to ttm_range_manager. ttm_range_man_alloc does a bunch of useless stuff when there is no hope of succeeding: * kzalloc a node struct * ttm_resource_init o add the node to an LRU * drm_mm_insert_node_in_range (which fails because the drm_mm was created with 0 size) * ttm_resource_fini o remove the node from an LRU * kfree the node struct In that process it also takes 3 spin_locks. All of that for TTM to figure out that VRAM is not a feasible placement. All we need to do here in the dummy manager is to return -ENOSPC. I really don't get why this bothers you so much, or why this is even controversial. Regards, Felix Why the heck didn't you ask me before doing stuff like that? Regards, Christian. Regards, Felix Christian. Reviewed-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 67 ++-- 1 file changed, 60 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 43d6a9d6a538..89d35d194f2c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -370,6 +370,45 @@ int amdgpu_vram_mgr_query_page_status(struct amdgpu_vram_mgr *mgr, return ret; } +static void amdgpu_dummy_vram_mgr_debug(struct ttm_resource_manager *man, + struct drm_printer *printer) +{ + DRM_DEBUG_DRIVER("Dummy vram mgr debug\n"); +} + +static bool amdgpu_dummy_vram_mgr_compatible(struct ttm_resource_manager *man, + struct ttm_resource *res, + const struct ttm_place *place, + size_t size) +{ + DRM_DEBUG_DRIVER("Dummy vram mgr compatible\n"); + return false; +} + +static bool amdgpu_dummy_vram_mgr_intersects(struct ttm_resource_manager *man, + struct ttm_resource *res, + const struct ttm_place *place, + size_t size) +{ + DRM_DEBUG_DRIVER("Dummy vram mgr intersects\n"); + return true; +} + +static void amdgpu_dummy_vram_mgr_del(struct ttm_resource_manager *man, + struct ttm_resource *res) +{ + DRM_DEBUG_DRIVER("Dummy vram mgr deleted\n"); +} + +static int amdgpu_dummy_vram_mgr_new(struct ttm_resource_manager *man, + struct ttm_buffer_object *tbo, + const struct ttm_place *place, + struct ttm_resource **res) +{ + DRM_DEBUG_DRIVER("Dummy vram mgr new\n"); + return -ENOSPC; +} + /** * amdgpu_vram_mgr_new - allocate new ranges * @@ -817,6 +856,14 @@ static void amdgpu_vram_mgr_debug(struct ttm_resource_manager *man, mutex_unlock(>lock); } +static const struct ttm_resource_manager_func amdgpu_dummy_vram_mgr_func = { + .alloc = amdgpu_dummy_vram_mgr_new, + .free = amdgpu_dummy_vram_mgr_del, + .intersects = amdgpu_dummy_vram_mgr_intersects, + .compatible = amdgpu_dummy_vram_mgr_compatible, + .debug = amdgpu_dummy_vram_mgr_debug +}; + static const struct ttm_resource_manager_func amdgpu_vram_mgr_func = { .alloc = amdgpu_vram_mgr_new, .free = amdgpu_vram_mgr_del, @@ -841,17 +888,22 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) ttm_resource_manager_init(man, >mman.bdev, adev->gmc.real_vram_size); - man->func = _vram_mgr_func; - - err = drm_buddy_init(>mm, man->size, PAGE_SIZE); - if (err) - return err; - mutex_init(>lock); INIT_LIST_HEAD(>reservations_pending); INIT_LIST_HEAD(>reserved_pages); mgr->default_page_size = PAGE_SIZE; + if (!adev->gmc.is_app_apu) { + man->func = _vram_mgr_func; + + err = drm_buddy_init(>mm, man->size, PAGE_SIZE); + if (err) + return err; + } else { + man->func = _dummy_vram_mgr_func; + DRM_INFO("Setup dummy vram mgr\n"); + } + t
Re: [PATCH 2/3] drm/amdgpu: Implement new dummy vram manager
Am 2023-06-14 um 06:38 schrieb Christian König: Am 10.05.23 um 00:01 schrieb Alex Deucher: From: Rajneesh Bhardwaj This adds dummy vram manager to support ASICs that do not have a dedicated or carvedout vram domain. Well that doesn't seem to make much sense. Why we should have that? TTM always expects a resource manager for VRAM. There are no NULL pointer checks in TTM for not having a resource manager for VRAM. The existing amdgpu_vram_mgr gets confused if there is no VRAM. It seemed cleaner to add a dummy manager than to scatter conditions for a memory-less GPU corner case through the regular VRAM manager. Regards, Felix Christian. Reviewed-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 67 ++-- 1 file changed, 60 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 43d6a9d6a538..89d35d194f2c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -370,6 +370,45 @@ int amdgpu_vram_mgr_query_page_status(struct amdgpu_vram_mgr *mgr, return ret; } +static void amdgpu_dummy_vram_mgr_debug(struct ttm_resource_manager *man, + struct drm_printer *printer) +{ + DRM_DEBUG_DRIVER("Dummy vram mgr debug\n"); +} + +static bool amdgpu_dummy_vram_mgr_compatible(struct ttm_resource_manager *man, + struct ttm_resource *res, + const struct ttm_place *place, + size_t size) +{ + DRM_DEBUG_DRIVER("Dummy vram mgr compatible\n"); + return false; +} + +static bool amdgpu_dummy_vram_mgr_intersects(struct ttm_resource_manager *man, + struct ttm_resource *res, + const struct ttm_place *place, + size_t size) +{ + DRM_DEBUG_DRIVER("Dummy vram mgr intersects\n"); + return true; +} + +static void amdgpu_dummy_vram_mgr_del(struct ttm_resource_manager *man, + struct ttm_resource *res) +{ + DRM_DEBUG_DRIVER("Dummy vram mgr deleted\n"); +} + +static int amdgpu_dummy_vram_mgr_new(struct ttm_resource_manager *man, + struct ttm_buffer_object *tbo, + const struct ttm_place *place, + struct ttm_resource **res) +{ + DRM_DEBUG_DRIVER("Dummy vram mgr new\n"); + return -ENOSPC; +} + /** * amdgpu_vram_mgr_new - allocate new ranges * @@ -817,6 +856,14 @@ static void amdgpu_vram_mgr_debug(struct ttm_resource_manager *man, mutex_unlock(>lock); } +static const struct ttm_resource_manager_func amdgpu_dummy_vram_mgr_func = { + .alloc = amdgpu_dummy_vram_mgr_new, + .free = amdgpu_dummy_vram_mgr_del, + .intersects = amdgpu_dummy_vram_mgr_intersects, + .compatible = amdgpu_dummy_vram_mgr_compatible, + .debug = amdgpu_dummy_vram_mgr_debug +}; + static const struct ttm_resource_manager_func amdgpu_vram_mgr_func = { .alloc = amdgpu_vram_mgr_new, .free = amdgpu_vram_mgr_del, @@ -841,17 +888,22 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev) ttm_resource_manager_init(man, >mman.bdev, adev->gmc.real_vram_size); - man->func = _vram_mgr_func; - - err = drm_buddy_init(>mm, man->size, PAGE_SIZE); - if (err) - return err; - mutex_init(>lock); INIT_LIST_HEAD(>reservations_pending); INIT_LIST_HEAD(>reserved_pages); mgr->default_page_size = PAGE_SIZE; + if (!adev->gmc.is_app_apu) { + man->func = _vram_mgr_func; + + err = drm_buddy_init(>mm, man->size, PAGE_SIZE); + if (err) + return err; + } else { + man->func = _dummy_vram_mgr_func; + DRM_INFO("Setup dummy vram mgr\n"); + } + ttm_set_driver_manager(>mman.bdev, TTM_PL_VRAM, >manager); ttm_resource_manager_set_used(man, true); return 0; @@ -886,7 +938,8 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) drm_buddy_free_list(>mm, >allocated); kfree(rsv); } - drm_buddy_fini(>mm); + if (!adev->gmc.is_app_apu) + drm_buddy_fini(>mm); mutex_unlock(>lock); ttm_resource_manager_cleanup(man);
Re: [PATCH] drm/amdkfd: Switch over to memdup_user()
Am 2023-06-13 um 22:04 schrieb Jiapeng Chong: Use memdup_user() rather than duplicating its implementation. This is a little bit restricted to reduce false positives. ./drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:2813:13-20: WARNING opportunity for memdup_user. Reported-by: Abaci Robot Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5523 Signed-off-by: Jiapeng Chong Kernel test robot is reporting a failure with this patch, looks like you used PTR_ERR incorrectly. Please make sure your patch compiles without warnings. I see more opportunities to use memdup_user in kfd_chardev.c, kfd_events.c, kfd_process_queue_manager.c and kfd_svm.c. Do you want to fix those, too, while you're at it? Thanks, Felix --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index d6b15493fffd..637962d4083c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -2810,12 +2810,9 @@ static uint32_t *get_queue_ids(uint32_t num_queues, uint32_t *usr_queue_id_array if (!usr_queue_id_array) return NULL; - queue_ids = kzalloc(array_size, GFP_KERNEL); - if (!queue_ids) - return ERR_PTR(-ENOMEM); - - if (copy_from_user(queue_ids, usr_queue_id_array, array_size)) - return ERR_PTR(-EFAULT); + queue_ids = memdup_user(usr_queue_id_array, array_size); + if (IS_ERR(queue_ids)) + return PTR_ERR(queue_ids); return queue_ids; }
Re: [PATCH] drm/amdkfd: decrement queue count on mes queue destroy
On 2023-06-13 17:48, Jonathan Kim wrote: Queue count should decrement on queue destruction regardless of HWS support type. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 8a39a9e0ed5a..f515cb8f30ca 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -2089,8 +2089,8 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm, list_del(>list); qpd->queue_count--; if (q->properties.is_active) { + decrement_queue_count(dqm, qpd, q); if (!dqm->dev->kfd->shared_resources.enable_mes) { - decrement_queue_count(dqm, qpd, q); retval = execute_queues_cpsch(dqm, KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0, USE_DEFAULT_GRACE_PERIOD);
Re: [PATCH] drm/amdgpu/sdma4: set align mask to 255
Am 2023-06-07 um 12:31 schrieb Alex Deucher: The wptr needs to be incremented at at least 64 dword intervals, use 256 to align with windows. This should fix potential hangs with unaligned updates. Signed-off-by: Alex Deucher Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index 1f83eebfc8a7..cd37f45e01a1 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -2312,7 +2312,7 @@ const struct amd_ip_funcs sdma_v4_0_ip_funcs = { static const struct amdgpu_ring_funcs sdma_v4_0_ring_funcs = { .type = AMDGPU_RING_TYPE_SDMA, - .align_mask = 0xf, + .align_mask = 0xff, .nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP), .support_64bit_ptrs = true, .secure_submission_supported = true, @@ -2344,7 +2344,7 @@ static const struct amdgpu_ring_funcs sdma_v4_0_ring_funcs = { static const struct amdgpu_ring_funcs sdma_v4_0_page_ring_funcs = { .type = AMDGPU_RING_TYPE_SDMA, - .align_mask = 0xf, + .align_mask = 0xff, .nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP), .support_64bit_ptrs = true, .secure_submission_supported = true, diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c index 8eebf9c2bbcd..05bb0691ee0e 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c @@ -1823,7 +1823,7 @@ const struct amd_ip_funcs sdma_v4_4_2_ip_funcs = { static const struct amdgpu_ring_funcs sdma_v4_4_2_ring_funcs = { .type = AMDGPU_RING_TYPE_SDMA, - .align_mask = 0xf, + .align_mask = 0xff, .nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP), .support_64bit_ptrs = true, .get_rptr = sdma_v4_4_2_ring_get_rptr, @@ -1854,7 +1854,7 @@ static const struct amdgpu_ring_funcs sdma_v4_4_2_ring_funcs = { static const struct amdgpu_ring_funcs sdma_v4_4_2_page_ring_funcs = { .type = AMDGPU_RING_TYPE_SDMA, - .align_mask = 0xf, + .align_mask = 0xff, .nop = SDMA_PKT_NOP_HEADER_OP(SDMA_OP_NOP), .support_64bit_ptrs = true, .get_rptr = sdma_v4_4_2_ring_get_rptr,
Re: [PATCHv2] drm/amdgpu: Update invalid PTE flag setting
Am 2023-06-12 um 12:23 schrieb Mukul Joshi: Update the invalid PTE flag setting with TF enabled. This is to ensure, in addition to transitioning the retry fault to a no-retry fault, it also causes the wavefront to enter the trap handler. With the current setting, the fault only transitions to a no-retry fault. Additionally, have 2 sets of invalid PTE settings, one for TF enabled, the other for TF disabled. The setting with TF disabled, doesn't work with TF enabled. Signed-off-by: Mukul Joshi --- v1->v2: - Update handling according to Christian's feedback. drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 7 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 6 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 3 +++ drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 11 +++ 5 files changed, 28 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index 6794edd1d2d2..e5c6b075fbbb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -152,6 +152,10 @@ struct amdgpu_gmc_funcs { void (*override_vm_pte_flags)(struct amdgpu_device *dev, struct amdgpu_vm *vm, uint64_t addr, uint64_t *flags); + /* update no-retry flags */ + void (*update_vm_pte_noretry_flags)(struct amdgpu_device *dev, + uint64_t *flags); + /* get the amount of memory used by the vbios for pre-OS console */ unsigned int (*get_vbios_fb_size)(struct amdgpu_device *adev); @@ -343,6 +347,9 @@ struct amdgpu_gmc { #define amdgpu_gmc_override_vm_pte_flags(adev, vm, addr, pte_flags) \ (adev)->gmc.gmc_funcs->override_vm_pte_flags \ ((adev), (vm), (addr), (pte_flags)) +#define amdgpu_gmc_update_vm_pte_noretry_flags(adev, pte_flags) \ + ((adev)->gmc.gmc_funcs->update_vm_pte_noretry_flags \ + ((adev), (pte_flags))) #define amdgpu_gmc_get_vbios_fb_size(adev) (adev)->gmc.gmc_funcs->get_vbios_fb_size((adev)) /** diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 1cb14ea18cd9..ff9db7e5c086 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2583,7 +2583,7 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid, /* Intentionally setting invalid PTE flag * combination to force a no-retry-fault */ - flags = AMDGPU_PTE_SNOOPED | AMDGPU_PTE_PRT; + flags = AMDGPU_VM_NORETRY_FLAGS; value = 0; } else if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) { /* Redirect the access to the dummy page */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 9c85d494f2a2..b81fcb962d8f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -84,7 +84,13 @@ struct amdgpu_mem_stats; /* PDE Block Fragment Size for VEGA10 */ #define AMDGPU_PDE_BFS(a) ((uint64_t)a << 59) +/* Flag combination to set no-retry with TF disabled */ +#define AMDGPU_VM_NORETRY_FLAGS(AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE | \ + AMDGPU_PTE_TF) +/* Flag combination to set no-retry with TF enabled */ +#define AMDGPU_VM_NORETRY_FLAGS_TF (AMDGPU_PTE_VALID | AMDGPU_PTE_SYSTEM | \ + AMDGPU_PTE_PRT) /* For GFX9 */ #define AMDGPU_PTE_MTYPE_VG10(a) ((uint64_t)(a) << 57) #define AMDGPU_PTE_MTYPE_VG10_MASKAMDGPU_PTE_MTYPE_VG10(3ULL) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c index dea1a64be44d..39f1650f6d00 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c @@ -804,6 +804,9 @@ static void amdgpu_vm_pte_update_flags(struct amdgpu_vm_update_params *params, flags |= AMDGPU_PTE_EXECUTABLE; } + if (adev->gmc.translate_further && level == AMDGPU_VM_PTB) + amdgpu_gmc_update_vm_pte_noretry_flags(adev, ); Don't you need a check that ((adev)->gmc.gmc_funcs->update_vm_pte_noretry_flags is not NULL? But adding a new callback for this may be overkill. Since the AMDGPU_VM_NORETRY_FLAGS(_TF) are defined in a non-HW-specific header file, you can probably implement the application of those flags in amdgpu_vm_pte_update_flags directly. Regards, Felix + /* APUs mapping system memory may need different MTYPEs on different * NUMA nodes. Only do this for contiguous ranges that can be assumed * to be on the same NUMA node. diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
Re: [PATCH] drm/amdkfd: fix null queue check on debug setting exceptions
Am 2023-06-12 um 11:46 schrieb Jonathan Kim: Null check should be done on queue struct itself and not on the process queue list node. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index cd34e7aaead4..fff3ccc04fa9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -1097,7 +1097,7 @@ void kfd_dbg_set_enabled_debug_exception_mask(struct kfd_process *target, pqm = >pqm; list_for_each_entry(pqn, >queues, process_queue_list) { - if (!pqn) + if (!pqn->q) continue; found_mask |= pqn->q->properties.exception_status;
Re: [PATCH v5 3/5] drm/amdkfd: set activated flag true when event age unmatchs
Testing for intermittent failures or race conditions is not easy. If we create such a test, we need to make sure it can catch the problem when not using the event ages, just to know that the test is good enough. I guess it could be a parametrized test that can run with or without event age. Without event age, we'd expect to catch a timeout. Not catching a timeout would be a test failure (indicating that the test is not good enough). With event age it should not time out, i.e. a timeout would be considered a failure in this case (indicating a problem with the event age mechanism). That said, I'd feel better about a ROCr test that doesn't just cover the KFD event age mechanism, but also its use in the ROCr implementation of HSA signal waiting. Regards, Felix Am 2023-06-12 um 12:19 schrieb Yat Sin, David: [AMD Official Use Only - General] The current ROCr patches already address my previous feedback. I am ok with the current ROCr patches. Currently, there is no ROCrtst that would stress this multiple-waiters issue. I was thinking something like the KFDTest, but with by calling the waiters from different threads. @Zhu, James Would you have time to look into this? ~David -Original Message- From: Kuehling, Felix Sent: Friday, June 9, 2023 6:44 PM To: Zhu, James ; amd-gfx@lists.freedesktop.org Cc: Yat Sin, David ; Zhu, James Subject: Re: [PATCH v5 3/5] drm/amdkfd: set activated flag true when event age unmatchs From the KFD perspective, the series is Reviewed-by: Felix Kuehling David, I looked at the ROCr and Thunk changes as well, and they look reasonable to me. Do you have any feedback on these patches from a ROCr point of view? Is there a reasonable stress test that could be used check that this handles the race conditions as expected and allows all waiters to sleep? Regards, Felix On 2023-06-09 16:43, James Zhu wrote: Set waiter's activated flag true when event age unmatchs with last_event_age. -v4: add event type check -v5: improve on event age enable and activated flags Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index c7689181cc22..b2586a1dd35d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -41,6 +41,7 @@ struct kfd_event_waiter { wait_queue_entry_t wait; struct kfd_event *event; /* Event to wait for */ bool activated; /* Becomes true when event is signaled */ + bool event_age_enabled; /* set to true when last_event_age is +non-zero */ }; /* @@ -797,9 +798,9 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) static int init_event_waiter(struct kfd_process *p, struct kfd_event_waiter *waiter, - uint32_t event_id) + struct kfd_event_data *event_data) { - struct kfd_event *ev = lookup_event_by_id(p, event_id); + struct kfd_event *ev = lookup_event_by_id(p, event_data->event_id); if (!ev) return -EINVAL; @@ -808,6 +809,15 @@ static int init_event_waiter(struct kfd_process *p, waiter->event = ev; waiter->activated = ev->signaled; ev->signaled = ev->signaled && !ev->auto_reset; + + /* last_event_age = 0 reserved for backward compatible */ + if (waiter->event->type == KFD_EVENT_TYPE_SIGNAL && + event_data->signal_event_data.last_event_age) { + waiter->event_age_enabled = true; + if (ev->event_age != event_data- signal_event_data.last_event_age) + waiter->activated = true; + } + if (!waiter->activated) add_wait_queue(>wq, >wait); spin_unlock(>lock); @@ -948,8 +958,7 @@ int kfd_wait_on_events(struct kfd_process *p, goto out_unlock; } - ret = init_event_waiter(p, _waiters[i], - event_data.event_id); + ret = init_event_waiter(p, _waiters[i], _data); if (ret) goto out_unlock; }
Re: [PATCH v2] gpu: drm/amd: Remove the redundant null pointer check in list_for_each_entry() loops
[+Jon] Am 2023-06-12 um 07:58 schrieb Lu Hongfei: pqn bound in list_for_each_entry loop will not be null, so there is no need to check whether pqn is NULL or not. Thus remove a redundant null pointer check. Signed-off-by: Lu Hongfei --- The filename of the previous version was: 0001-gpu-drm-amd-Fix-the-bug-in-list_for_each_entry-loops.patch The modifications made compared to the previous version are as follows: 1. Modified the patch title 2. "Thus remove a redundant null pointer check." is used instead of "We could remove this check." drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index cd34e7aaead4..10d0cef844f0 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -1097,9 +1097,6 @@ void kfd_dbg_set_enabled_debug_exception_mask(struct kfd_process *target, pqm = >pqm; list_for_each_entry(pqn, >queues, process_queue_list) { - if (!pqn) Right, this check doesn't make a lot of sense. Jon, was this meant to check pqn->q? Regards, Felix - continue; - found_mask |= pqn->q->properties.exception_status; }
Re: [PATCH v5 3/5] drm/amdkfd: set activated flag true when event age unmatchs
From the KFD perspective, the series is Reviewed-by: Felix Kuehling David, I looked at the ROCr and Thunk changes as well, and they look reasonable to me. Do you have any feedback on these patches from a ROCr point of view? Is there a reasonable stress test that could be used check that this handles the race conditions as expected and allows all waiters to sleep? Regards, Felix On 2023-06-09 16:43, James Zhu wrote: Set waiter's activated flag true when event age unmatchs with last_event_age. -v4: add event type check -v5: improve on event age enable and activated flags Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index c7689181cc22..b2586a1dd35d 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -41,6 +41,7 @@ struct kfd_event_waiter { wait_queue_entry_t wait; struct kfd_event *event; /* Event to wait for */ bool activated; /* Becomes true when event is signaled */ + bool event_age_enabled; /* set to true when last_event_age is non-zero */ }; /* @@ -797,9 +798,9 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) static int init_event_waiter(struct kfd_process *p, struct kfd_event_waiter *waiter, - uint32_t event_id) + struct kfd_event_data *event_data) { - struct kfd_event *ev = lookup_event_by_id(p, event_id); + struct kfd_event *ev = lookup_event_by_id(p, event_data->event_id); if (!ev) return -EINVAL; @@ -808,6 +809,15 @@ static int init_event_waiter(struct kfd_process *p, waiter->event = ev; waiter->activated = ev->signaled; ev->signaled = ev->signaled && !ev->auto_reset; + + /* last_event_age = 0 reserved for backward compatible */ + if (waiter->event->type == KFD_EVENT_TYPE_SIGNAL && + event_data->signal_event_data.last_event_age) { + waiter->event_age_enabled = true; + if (ev->event_age != event_data->signal_event_data.last_event_age) + waiter->activated = true; + } + if (!waiter->activated) add_wait_queue(>wq, >wait); spin_unlock(>lock); @@ -948,8 +958,7 @@ int kfd_wait_on_events(struct kfd_process *p, goto out_unlock; } - ret = init_event_waiter(p, _waiters[i], - event_data.event_id); + ret = init_event_waiter(p, _waiters[i], _data); if (ret) goto out_unlock; }
Re: [PATCH v4 3/5] drm/amdkfd: set activated flag true when event age unmatchs
On 2023-06-09 16:13, James Zhu wrote: Set waiter's activated flag true when event age unmatchs with last_event_age. -v4: add event type check Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index c7689181cc22..2cc1a7e976f4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -41,6 +41,7 @@ struct kfd_event_waiter { wait_queue_entry_t wait; struct kfd_event *event; /* Event to wait for */ bool activated; /* Becomes true when event is signaled */ + bool event_age_enabled; /* set to true when last_event_age is non-zero */ }; /* @@ -797,9 +798,9 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) static int init_event_waiter(struct kfd_process *p, struct kfd_event_waiter *waiter, - uint32_t event_id) + struct kfd_event_data *event_data) { - struct kfd_event *ev = lookup_event_by_id(p, event_id); + struct kfd_event *ev = lookup_event_by_id(p, event_data->event_id); if (!ev) return -EINVAL; @@ -808,6 +809,13 @@ static int init_event_waiter(struct kfd_process *p, waiter->event = ev; waiter->activated = ev->signaled; ev->signaled = ev->signaled && !ev->auto_reset; + + /* last_event_age = 0 reserved for backward compatible */ + waiter->event_age_enabled = !!event_data->signal_event_data.last_event_age; This should also be inside the "if (waiter->event->type == KFD_EVENT_TYPE_SIGNAL)". I'd do something like this: if (waiter->event->type == KFD_EVENT_TYPE_SIGNAL && event_data->signal_event_data.last_event_age) { waiter->event_age_enabled = true; if (ev->event_age != event_data->signal_event_data.last_event_age) waiter->activated = true; } You don't need WRITE_ONCE here because there can be no concurrent access before you add the waiter to the wait queue. Regards, Felix + if (waiter->event->type == KFD_EVENT_TYPE_SIGNAL && waiter->event_age_enabled && + ev->event_age != event_data->signal_event_data.last_event_age) + WRITE_ONCE(waiter->activated, true); + if (!waiter->activated) add_wait_queue(>wq, >wait); spin_unlock(>lock); @@ -948,8 +956,7 @@ int kfd_wait_on_events(struct kfd_process *p, goto out_unlock; } - ret = init_event_waiter(p, _waiters[i], - event_data.event_id); + ret = init_event_waiter(p, _waiters[i], _data); if (ret) goto out_unlock; }
Re: [PATCH v3 3/5] drm/amdkfd: set activated flag true when event age unmatchs
On 2023-06-08 13:07, James Zhu wrote: Set waiter's activated flag true when event age unmatchs with last_event_age. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index c7689181cc22..4c6907507190 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -41,6 +41,7 @@ struct kfd_event_waiter { wait_queue_entry_t wait; struct kfd_event *event; /* Event to wait for */ bool activated; /* Becomes true when event is signaled */ + bool event_age_enabled; /* set to true when last_event_age is non-zero */ }; /* @@ -797,9 +798,9 @@ static struct kfd_event_waiter *alloc_event_waiters(uint32_t num_events) static int init_event_waiter(struct kfd_process *p, struct kfd_event_waiter *waiter, - uint32_t event_id) + struct kfd_event_data *event_data) { - struct kfd_event *ev = lookup_event_by_id(p, event_id); + struct kfd_event *ev = lookup_event_by_id(p, event_data->event_id); if (!ev) return -EINVAL; @@ -808,6 +809,13 @@ static int init_event_waiter(struct kfd_process *p, waiter->event = ev; waiter->activated = ev->signaled; ev->signaled = ev->signaled && !ev->auto_reset; + + /* last_event_age = 0 reserved for backward compatible */ + waiter->event_age_enabled = !!event_data->signal_event_data.last_event_age; + if (waiter->event_age_enabled && + ev->event_age != event_data->signal_event_data.last_event_age) + WRITE_ONCE(waiter->activated, true); This needs to check the event type. Looking at event_data->signal_event_data when this is not a signal event is illegal, because it is aliased in a union with other event type data. Other than that, the series looks good to me now. Regards, Felix + if (!waiter->activated) add_wait_queue(>wq, >wait); spin_unlock(>lock); @@ -948,8 +956,7 @@ int kfd_wait_on_events(struct kfd_process *p, goto out_unlock; } - ret = init_event_waiter(p, _waiters[i], - event_data.event_id); + ret = init_event_waiter(p, _waiters[i], _data); if (ret) goto out_unlock; }
Re: [PATCH v2 10/12] drm/amdgpu: remove unused functions and variables
On 2023-04-12 12:25, Shashank Sharma wrote: This patch removes some variables and functions from KFD doorbell handling code, which are no more required since doorbell manager is handling doorbell calculations. Cc: Alex Deucher Cc: Christian Koenig Signed-off-by: Shashank Sharma Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 32 --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 12 - 2 files changed, 44 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c index 718cfe9cb4f5..f4088cfd52db 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c @@ -193,38 +193,6 @@ void write_kernel_doorbell64(void __iomem *db, u64 value) } } -unsigned int kfd_get_doorbell_dw_offset_in_bar(struct kfd_dev *kfd, - struct kfd_process_device *pdd, - unsigned int doorbell_id) -{ - /* -* doorbell_base_dw_offset accounts for doorbells taken by KGD. -* index * kfd_doorbell_process_slice/sizeof(u32) adjusts to -* the process's doorbells. The offset returned is in dword -* units regardless of the ASIC-dependent doorbell size. -*/ - if (!kfd->shared_resources.enable_mes) - return kfd->doorbell_base_dw_offset + - pdd->doorbell_index - * kfd_doorbell_process_slice(kfd) / sizeof(u32) + - doorbell_id * - kfd->device_info.doorbell_size / sizeof(u32); - else - return amdgpu_mes_get_doorbell_dw_offset_in_bar( - (struct amdgpu_device *)kfd->adev, - pdd->doorbell_index, doorbell_id); -} - -uint64_t kfd_get_number_elems(struct kfd_dev *kfd) -{ - uint64_t num_of_elems = (kfd->shared_resources.doorbell_aperture_size - - kfd->shared_resources.doorbell_start_offset) / - kfd_doorbell_process_slice(kfd) + 1; - - return num_of_elems; - -} - phys_addr_t kfd_get_process_doorbells(struct kfd_process_device *pdd) { struct amdgpu_device *adev = pdd->dev->adev; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index dfff77379acb..1bc6a8ed8cda 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -257,15 +257,6 @@ struct kfd_dev { unsigned int id; /* topology stub index */ - phys_addr_t doorbell_base; /* Start of actual doorbells used by -* KFD. It is aligned for mapping -* into user mode -*/ - size_t doorbell_base_dw_offset; /* Offset from the start of the PCI -* doorbell BAR to the first KFD -* doorbell in dwords. GFX reserves -* the segment before this offset. -*/ u32 __iomem *doorbell_kernel_ptr; /* This is a pointer for a doorbells * page used by kernel queue */ @@ -276,8 +267,6 @@ struct kfd_dev { const struct kfd2kgd_calls *kfd2kgd; struct mutex doorbell_mutex; - DECLARE_BITMAP(doorbell_available_index, - KFD_MAX_NUM_OF_QUEUES_PER_PROCESS); void *gtt_mem; uint64_t gtt_start_gpu_addr; @@ -754,7 +743,6 @@ struct kfd_process_device { struct attribute attr_evict; struct kobject *kobj_stats; - unsigned int doorbell_index; /* * @cu_occupancy: Reports occupancy of Compute Units (CU) of a process
Re: [PATCH v2 08/12] drm/amdgpu: use doorbell manager for kfd kernel doorbells
On 2023-04-25 15:59, Shashank Sharma wrote: On 24/04/2023 21:56, Felix Kuehling wrote: On 2023-04-22 2:39, Shashank Sharma wrote: - KFD process level doorbells: doorbell pages which are allocated by kernel but mapped and written by userspace processes, saved in struct pdd->qpd->doorbells size = kfd_doorbell_process_slice. We realized that we only need 1-2 doorbells for KFD kernel level stuff (so kept it one page), but need 2-page of doorbells for KFD process, so they are sized accordingly. We have also run kfd_test_suit and verified the changes for any regression. Hope this helps in explaining the design. Right, I missed that this was only for kernel doorbells. I wonder whether KFD really needs its own page here. I think we only need a doorbell for HWS. And when we use MES, I think even that isn't needed because MES packet submissions go through amdgpu. So maybe KFD doesn't need its own kernel-mode doorbell page any more on systems with user graphics mode queues. Yeah, for any IP with MES enabled, KFD doesn't need kernel level doorbells. But I still allocated a page just to make sure we do not break any non-MES platforms or use cases where MES is deliberately disabled from kernel command line. Hope that works for you. Even without MES, we still only need one doorbell for HWS. Allocating a whole page for that is wasteful. Anyway, I'm OK with cleaning that up later. Regards, Felix - Shashank Regards, Felix - Shashank
Re: [PATCH] drm/amdkfd: fix and enable debugging for gfx11
On 2023-06-07 16:20, Jonathan Kim wrote: There are a couple of fixes required to enable gfx11 debugging. First, ADD_QUEUE.trap_en is an inappropriate place to toggle a per-process register so move it to SET_SHADER_DEBUGGER.trap_en. When ADD_QUEUE.skip_process_ctx_clear is set, MES will prioritize the SET_SHADER_DEBUGGER.trap_en setting. Second, to preserve correct save/restore priviledged wave states in coordination with the trap enablement setting, resume suspended waves early in the disable call. NOTE: The AMDGPU_MES_VERSION_MASK check is a place holder as MES FW updates have been reviewed but is awaiting binary creation. Once the binaries have been created, this check may be subject to change. v2: do a trap_en safety check in case old mes doesn't accept unused trap_en d-word. remove unnecessary process termination work around. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c| 7 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h| 4 +++- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 14 ++ .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 3 +-- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 12 +++- drivers/gpu/drm/amd/include/mes_v11_api_def.h | 1 + 7 files changed, 25 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index 20cc3fffe921..e9091ebfe230 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -928,7 +928,8 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device *adev, uint64_t process_context_addr, uint32_t spi_gdbg_per_vmid_cntl, const uint32_t *tcp_watch_cntl, - uint32_t flags) + uint32_t flags, + bool trap_en) { struct mes_misc_op_input op_input = {0}; int r; @@ -945,6 +946,10 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device *adev, memcpy(op_input.set_shader_debugger.tcp_watch_cntl, tcp_watch_cntl, sizeof(op_input.set_shader_debugger.tcp_watch_cntl)); + if (((adev->mes.sched_version & AMDGPU_MES_API_VERSION_MASK) >> + AMDGPU_MES_API_VERSION_SHIFT) >= 14) + op_input.set_shader_debugger.trap_en = trap_en; + It's probably too late to change the GFX11 MES API at this point. But why didn't they just add a trap_en bit in the existing flags field? That could have avoided the need for the compatibility checks. Anyway, the patch is Reviewed-by: Felix Kuehling amdgpu_mes_lock(>mes); r = adev->mes.funcs->misc_op(>mes, _input); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h index b5f5eed2b5ef..2d6ac30b7135 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h @@ -294,6 +294,7 @@ struct mes_misc_op_input { } flags; uint32_t spi_gdbg_per_vmid_cntl; uint32_t tcp_watch_cntl[4]; + uint32_t trap_en; } set_shader_debugger; }; }; @@ -361,7 +362,8 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device *adev, uint64_t process_context_addr, uint32_t spi_gdbg_per_vmid_cntl, const uint32_t *tcp_watch_cntl, - uint32_t flags); + uint32_t flags, + bool trap_en); int amdgpu_mes_add_ring(struct amdgpu_device *adev, int gang_id, int queue_type, int idx, diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index c4e3cb8d44de..1bdaa00c0b46 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -347,6 +347,7 @@ static int mes_v11_0_misc_op(struct amdgpu_mes *mes, memcpy(misc_pkt.set_shader_debugger.tcp_watch_cntl, input->set_shader_debugger.tcp_watch_cntl, sizeof(misc_pkt.set_shader_debugger.tcp_watch_cntl)); + misc_pkt.set_shader_debugger.trap_en = input->set_shader_debugger.trap_en; break; default: DRM_ERROR("unsupported misc op (%d) \n", input->op); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index 125274445f43..cd34e7aaead4 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -349,12 +349,13 @@ int kfd_dbg_set
Re: [PATCH] drm/amdkfd: optimize gfx off enable toggle for debugging
On 2023-06-07 13:32, Jonathan Kim wrote: Legacy debug devices limited to pinning a single debug VMID for debugging are the only devices that require disabling GFX OFF while accessing debug registers. Debug devices that support multi-process debugging rely on the hardware scheduler to update debug registers and do not run into GFX OFF access issues. Remove KFD GFX OFF enable toggle clutter by moving these calls into the KGD debug calls themselves. v2: toggle gfx off around address watch hi/lo settings as well. Signed-off-by: Jonathan Kim --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 4 +++ .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c | 7 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c| 33 ++- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 4 +++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 24 ++ drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 22 +++-- drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 21 +--- Looks like you missed one amdgpu_amdkfd_gfx_off_ctrl call in kfd_process.c. 7 files changed, 77 insertions(+), 38 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c index 60f9e027fb66..1f0e6ec56618 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c @@ -150,6 +150,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( VALID, 1); + amdgpu_gfx_off_ctrl(adev, false); + Aldebaran doesn't use automatic gfxoff, so this should not be needed. WREG32_RLC((SOC15_REG_OFFSET(GC, 0, regTCP_WATCH0_ADDR_H) + (watch_id * TCP_WATCH_STRIDE)), watch_address_high); @@ -158,6 +160,8 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch( (watch_id * TCP_WATCH_STRIDE)), watch_address_low); + amdgpu_gfx_off_ctrl(adev, true); + return watch_address_cntl; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c index 625db444df1c..a4e28d547173 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c @@ -350,6 +350,8 @@ static uint32_t kgd_arcturus_enable_debug_trap(struct amdgpu_device *adev, bool restore_dbg_registers, uint32_t vmid) { + amdgpu_gfx_off_ctrl(adev, false); + I would need to double check, but I believe Arcturus also doesn't support gfxoff. mutex_lock(>grbm_idx_mutex); kgd_gfx_v9_set_wave_launch_stall(adev, vmid, true); @@ -362,6 +364,8 @@ static uint32_t kgd_arcturus_enable_debug_trap(struct amdgpu_device *adev, mutex_unlock(>grbm_idx_mutex); + amdgpu_gfx_off_ctrl(adev, true); + return 0; } @@ -375,6 +379,7 @@ static uint32_t kgd_arcturus_disable_debug_trap(struct amdgpu_device *adev, bool keep_trap_enabled, uint32_t vmid) { + amdgpu_gfx_off_ctrl(adev, false); mutex_lock(>grbm_idx_mutex); @@ -388,6 +393,8 @@ static uint32_t kgd_arcturus_disable_debug_trap(struct amdgpu_device *adev, mutex_unlock(>grbm_idx_mutex); + amdgpu_gfx_off_ctrl(adev, true); + return 0; } const struct kfd2kgd_calls arcturus_kfd2kgd = { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c index 8ad7a7779e14..415928139861 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c @@ -754,12 +754,13 @@ uint32_t kgd_gfx_v10_enable_debug_trap(struct amdgpu_device *adev, bool restore_dbg_registers, uint32_t vmid) { + amdgpu_gfx_off_ctrl(adev, false); mutex_lock(>grbm_idx_mutex); kgd_gfx_v10_set_wave_launch_stall(adev, vmid, true); - /* assume gfx off is disabled for the debug session if rlc restore not supported. */ + /* keep gfx off disabled for the debug session if rlc restore not supported. */ if (restore_dbg_registers) { uint32_t data = 0; @@ -784,6 +785,8 @@ uint32_t kgd_gfx_v10_enable_debug_trap(struct amdgpu_device *adev, mutex_unlock(>grbm_idx_mutex); + amdgpu_gfx_off_ctrl(adev, true); + return 0; } @@ -791,6 +794,8 @@ uint32_t kgd_gfx_v10_disable_debug_trap(struct amdgpu_device *adev, bool keep_trap_enabled, uint32_t vmid) { + amdgpu_gfx_off_ctrl(adev, false); + mutex_lock(>grbm_idx_mutex); kgd_gfx_v10_set_wave_launch_stall(adev, vmid, true); @@ -801,6 +806,16 @@ uint32_t
Re: [PATCH] drm/amdkfd: fix and enable debugging for gfx11
On 2023-06-07 13:26, Jonathan Kim wrote: There are a few fixes required to enable gfx11 debugging. First, ADD_QUEUE.trap_en is an inappropriate place to toggle a per-process register so move it to SET_SHADER_DEBUGGER.trap_en. When ADD_QUEUE.skip_process_ctx_clear is set, MES will prioritize the SET_SHADER_DEBUGGER.trap_en setting. I see you have a firmware version check for enabling debugging. But is the struct SET_SHADER_DEBUGGER change safe with older firmware when debugging is disabled? Second, to preserve correct save/restore priviledged wave states in coordination with the trap enablement setting, resume suspended waves early in the disable call. Finally, displaced single stepping can cause non-fatal illegal instructions during process termination on debug disable. To work around this, stall the waves prior to disable and allow clean up to happen naturally on process termination. NOTE: The AMDGPU_MES_VERSION_MASK check is a place holder as MES FW updates have been reviewed but is awaiting binary creation. Once the binaries have been created, this check may be subject to change. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 5 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 4 ++- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c| 1 + drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 31 ++- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 12 --- drivers/gpu/drm/amd/include/mes_v11_api_def.h | 1 + 7 files changed, 40 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index 20cc3fffe921..95d69f9c7361 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c @@ -928,7 +928,8 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device *adev, uint64_t process_context_addr, uint32_t spi_gdbg_per_vmid_cntl, const uint32_t *tcp_watch_cntl, - uint32_t flags) + uint32_t flags, + bool trap_en) { struct mes_misc_op_input op_input = {0}; int r; @@ -945,6 +946,8 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device *adev, memcpy(op_input.set_shader_debugger.tcp_watch_cntl, tcp_watch_cntl, sizeof(op_input.set_shader_debugger.tcp_watch_cntl)); + op_input.set_shader_debugger.trap_en = trap_en; + amdgpu_mes_lock(>mes); r = adev->mes.funcs->misc_op(>mes, _input); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h index b5f5eed2b5ef..2d6ac30b7135 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h @@ -294,6 +294,7 @@ struct mes_misc_op_input { } flags; uint32_t spi_gdbg_per_vmid_cntl; uint32_t tcp_watch_cntl[4]; + uint32_t trap_en; } set_shader_debugger; }; }; @@ -361,7 +362,8 @@ int amdgpu_mes_set_shader_debugger(struct amdgpu_device *adev, uint64_t process_context_addr, uint32_t spi_gdbg_per_vmid_cntl, const uint32_t *tcp_watch_cntl, - uint32_t flags); + uint32_t flags, + bool trap_en); int amdgpu_mes_add_ring(struct amdgpu_device *adev, int gang_id, int queue_type, int idx, diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index c4e3cb8d44de..1bdaa00c0b46 100644 --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c @@ -347,6 +347,7 @@ static int mes_v11_0_misc_op(struct amdgpu_mes *mes, memcpy(misc_pkt.set_shader_debugger.tcp_watch_cntl, input->set_shader_debugger.tcp_watch_cntl, sizeof(misc_pkt.set_shader_debugger.tcp_watch_cntl)); + misc_pkt.set_shader_debugger.trap_en = input->set_shader_debugger.trap_en; break; default: DRM_ERROR("unsupported misc op (%d) \n", input->op); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c index 125274445f43..e7bc07068eed 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c @@ -349,12 +349,30 @@ int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd) { uint32_t spi_dbg_cntl = pdd->spi_dbg_override | pdd->spi_dbg_launch_mode; uint32_t flags = pdd->process->dbg_flags; + bool sq_trap_en = !!spi_dbg_cntl; if
Re: [PATCH v2 3/3] drm/amdkfd: don't sleep when event age unmatch
On 2023-06-06 12:24, James Zhu wrote: Don't sleep when event age unmatch, and update last_event_age. It is only for KFD_EVENT_TYPE_SIGNAL which is checked by user space. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index c7689181cc22..f4ceb5be78ed 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c @@ -952,6 +952,21 @@ int kfd_wait_on_events(struct kfd_process *p, event_data.event_id); if (ret) goto out_unlock; + + /* last_event_age = 0 reserved for backward compatible */ + if (event_data.signal_event_data.last_event_age && + event_waiters[i].event->event_age != + event_data.signal_event_data.last_event_age) { + event_data.signal_event_data.last_event_age = + event_waiters[i].event->event_age; The event_age is updated in set_event under the event->spin_lock. You need to take that lock for this check here as well. I think the easiest way to do this would be to move the check into init_event_waiter. That way you can initialize the waiter as activated if the event age is not up to date. + WRITE_ONCE(event_waiters[i].activated, true); + + if (copy_to_user([i], _data, + sizeof(struct kfd_event_data))) { + ret = -EFAULT; + goto out_unlock; + } + } I think we also need to update the event age in event data after an event has signaled. You should probably move updating and copying of the event age to user mode into copy_signaled_event_data. That way it would handle all the cases. Regards, Felix } /* Check condition once. */
Re: [PATCH v2 1/3] drm/amdkfd: add event age tracking
On 2023-06-06 12:24, James Zhu wrote: Add event age tracking Signed-off-by: James Zhu --- include/uapi/linux/kfd_ioctl.h | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 1781e7669982..eeb2fdcbdcb7 100644 --- a/include/uapi/linux/kfd_ioctl.h +++ b/include/uapi/linux/kfd_ioctl.h @@ -39,9 +39,10 @@ * - 1.11 - Add unified memory for ctx save/restore area * - 1.12 - Add DMA buf export ioctl * - 1.13 - Add debugger API + * - 1.14 - Update kfd_event_data */ #define KFD_IOCTL_MAJOR_VERSION 1 -#define KFD_IOCTL_MINOR_VERSION 13 +#define KFD_IOCTL_MINOR_VERSION 14 Bumping the version number should be done in the last patch in the series, once the feature is fully enabled. Regards, Felix struct kfd_ioctl_get_version_args { __u32 major_version;/* from KFD */ @@ -320,12 +321,20 @@ struct kfd_hsa_hw_exception_data { __u32 gpu_id; }; +/* hsa signal event data */ +struct kfd_hsa_signal_event_data { + __u64 last_event_age; /* to and from KFD */ +}; + /* Event data */ struct kfd_event_data { union { + /* From KFD */ struct kfd_hsa_memory_exception_data memory_exception_data; struct kfd_hsa_hw_exception_data hw_exception_data; - }; /* From KFD */ + /* To and From KFD */ + struct kfd_hsa_signal_event_data signal_event_data; + }; __u64 kfd_event_data_ext; /* pointer to an extension structure for future exception types */ __u32 event_id; /* to KFD */
Re: [PATCH] drm/amdkfd: Fix reserved SDMA queues handling
On 2023-06-07 11:27, Mukul Joshi wrote: This patch fixes a regression caused by a bad merge where the handling of reserved SDMA queues was accidentally removed. With the fix, the reserved SDMA queues are again correctly marked as unavailable for allocation. Fixes: c27842c84a848 ("drm/amdkfd: Update SDMA queue management for GFX9.4.3") Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 13 ++--- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 10 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 3 files changed, 12 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 9fc9d32cb579..9d4abfd8b55e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -106,20 +106,19 @@ static void kfd_device_info_set_sdma_info(struct kfd_dev *kfd) kfd->device_info.num_sdma_queues_per_engine = 8; } + bitmap_zero(kfd->device_info.reserved_sdma_queues_bitmap, KFD_MAX_SDMA_QUEUES); + switch (sdma_version) { case IP_VERSION(6, 0, 0): + case IP_VERSION(6, 0, 1): case IP_VERSION(6, 0, 2): case IP_VERSION(6, 0, 3): /* Reserve 1 for paging and 1 for gfx */ kfd->device_info.num_reserved_sdma_queues_per_engine = 2; /* BIT(0)=engine-0 queue-0; BIT(1)=engine-1 queue-0; BIT(2)=engine-0 queue-1; ... */ - kfd->device_info.reserved_sdma_queues_bitmap = 0xFULL; - break; - case IP_VERSION(6, 0, 1): - /* Reserve 1 for paging and 1 for gfx */ - kfd->device_info.num_reserved_sdma_queues_per_engine = 2; - /* BIT(0)=engine-0 queue-0; BIT(1)=engine-0 queue-1; ... */ - kfd->device_info.reserved_sdma_queues_bitmap = 0x3ULL; + bitmap_set(kfd->device_info.reserved_sdma_queues_bitmap, 0, + kfd->adev->sdma.num_instances * + kfd->device_info.num_reserved_sdma_queues_per_engine); break; default: break; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 0c1be91a87c6..498ad7d4e7d9 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -123,11 +123,6 @@ unsigned int get_num_xgmi_sdma_queues(struct device_queue_manager *dqm) dqm->dev->kfd->device_info.num_sdma_queues_per_engine; } -static inline uint64_t get_reserved_sdma_queues_bitmap(struct device_queue_manager *dqm) -{ - return dqm->dev->kfd->device_info.reserved_sdma_queues_bitmap; -} - static void init_sdma_bitmaps(struct device_queue_manager *dqm) { bitmap_zero(dqm->sdma_bitmap, KFD_MAX_SDMA_QUEUES); @@ -135,6 +130,11 @@ static void init_sdma_bitmaps(struct device_queue_manager *dqm) bitmap_zero(dqm->xgmi_sdma_bitmap, KFD_MAX_SDMA_QUEUES); bitmap_set(dqm->xgmi_sdma_bitmap, 0, get_num_xgmi_sdma_queues(dqm)); + + /* Mask out the reserved queues */ + bitmap_andnot(dqm->sdma_bitmap, dqm->sdma_bitmap, + dqm->dev->kfd->device_info.reserved_sdma_queues_bitmap, + KFD_MAX_SDMA_QUEUES); } void program_sh_mem_settings(struct device_queue_manager *dqm, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 023b17e0116b..7364a5d77c6e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -239,7 +239,7 @@ struct kfd_device_info { uint32_t no_atomic_fw_version; unsigned int num_sdma_queues_per_engine; unsigned int num_reserved_sdma_queues_per_engine; - uint64_t reserved_sdma_queues_bitmap; + DECLARE_BITMAP(reserved_sdma_queues_bitmap, KFD_MAX_SDMA_QUEUES); }; unsigned int kfd_get_num_sdma_engines(struct kfd_node *kdev);