The control stack size is calculated based on the number of CUs and
waves, and is then aligned to PAGE_SIZE. When the resulting control
stack size is aligned to 64 KB, GPU hangs and queue preemption
failures are observed while running RCCL unit tests on systems with
more than two GPUs.

amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: GPU reset begin!. Source: 4
amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: Failed to restore process queues

This issue is observed on both 4 KB and 64 KB system page-size
configurations.

This patch fixes the issue by aligning the control stack size to
AMDGPU_GPU_PAGE_SIZE instead of PAGE_SIZE, so the control stack size
will not be 64 KB on systems with a 64 KB page size and queue
preemption works correctly.

Additionally, In the current code, wg_data_size is aligned to PAGE_SIZE,
which can waste memory if the system page size is large. In this patch,
wg_data_size is aligned to AMDGPU_GPU_PAGE_SIZE. The cwsr_size, calculated
from wg_data_size and the control stack size, is aligned to PAGE_SIZE.

Signed-off-by: Donet Tom <[email protected]>
---
 drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
index 2822c90bd7be..b97f4a51db6e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
@@ -444,10 +444,11 @@ void kfd_queue_ctx_save_restore_size(struct 
kfd_topology_device *dev)
                    min(cu_num * 40, props->array_count / 
props->simd_arrays_per_engine * 512)
                    : cu_num * 32;
 
-       wg_data_size = ALIGN(cu_num * WG_CONTEXT_DATA_SIZE_PER_CU(gfxv, props), 
PAGE_SIZE);
+       wg_data_size = ALIGN(cu_num * WG_CONTEXT_DATA_SIZE_PER_CU(gfxv, props),
+                               AMDGPU_GPU_PAGE_SIZE);
        ctl_stack_size = wave_num * CNTL_STACK_BYTES_PER_WAVE(gfxv) + 8;
        ctl_stack_size = ALIGN(SIZEOF_HSA_USER_CONTEXT_SAVE_AREA_HEADER + 
ctl_stack_size,
-                              PAGE_SIZE);
+                              AMDGPU_GPU_PAGE_SIZE);
 
        if ((gfxv / 10000 * 10000) == 100000) {
                /* HW design limits control stack size to 0x7000.
@@ -459,7 +460,7 @@ void kfd_queue_ctx_save_restore_size(struct 
kfd_topology_device *dev)
 
        props->ctl_stack_size = ctl_stack_size;
        props->debug_memory_size = ALIGN(wave_num * DEBUGGER_BYTES_PER_WAVE, 
DEBUGGER_BYTES_ALIGN);
-       props->cwsr_size = ctl_stack_size + wg_data_size;
+       props->cwsr_size = ALIGN(ctl_stack_size + wg_data_size, PAGE_SIZE);
 
        if (gfxv == 80002)      /* GFX_VERSION_TONGA */
                props->eop_buffer_size = 0x8000;
-- 
2.52.0

Reply via email to