[PATCH 16/18] drm/amdgpu: increate mailbox polling timeout to 12s.
From: Horace Chen Because there may have multiple FLR waiting for done, the waiting time of events may be long, add the time to 12s to reduce timeout failure. Change-Id: I6b33170ba7dedf781b99ba6095127efce403af81 Signed-off-by: Horace Chen --- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h index 1e91b9a..67e7857 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h @@ -24,7 +24,7 @@ #ifndef __MXGPU_AI_H__ #define __MXGPU_AI_H__ -#define AI_MAILBOX_TIMEDOUT5000 +#define AI_MAILBOX_TIMEDOUT12000 enum idh_request { IDH_REQ_GPU_INIT_ACCESS = 1, diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h index c791d73..f13dc6c 100644 --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h @@ -23,7 +23,7 @@ #ifndef __MXGPU_VI_H__ #define __MXGPU_VI_H__ -#define VI_MAILBOX_TIMEDOUT5000 +#define VI_MAILBOX_TIMEDOUT12000 #define VI_MAILBOX_RESET_TIME 12 /* VI mailbox messages request */ -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 17/18] drm/amdgpu:fix uvd ring fini routine
fix missing finish uvd enc_ring and wrongly finish uvd ring Change-Id: Ib74237ca5adcb3b128c9b751fced0b7db7b09e86 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c index 331e34a..63b00eb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c @@ -269,6 +269,8 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev) int amdgpu_uvd_sw_fini(struct amdgpu_device *adev) { + struct amdgpu_ring *ring; + int i; kfree(adev->uvd.saved_bo); amd_sched_entity_fini(&adev->uvd.ring.sched, &adev->uvd.entity); @@ -277,7 +279,15 @@ int amdgpu_uvd_sw_fini(struct amdgpu_device *adev) &adev->uvd.gpu_addr, (void **)&adev->uvd.cpu_addr); - amdgpu_ring_fini(&adev->uvd.ring); + ring = &adev->uvd.ring; + if (ring->adev) + amdgpu_ring_fini(ring); + + for (i = 0; i < AMDGPU_MAX_UVD_ENC_RINGS; ++i) { + ring = &adev->uvd.ring_enc[i]; + if (ring->adev) + amdgpu_ring_fini(ring); + } release_firmware(adev->uvd.fw); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 08/18] drm/amdgpu:halt when vm fault
only with this way we can debug the VMC page fault issue Change-Id: Ifc8373c3c3c40d54ae94dedf1be74d6314faeb10 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 6 ++ drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 7 +++ 2 files changed, 13 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c index 6c8040e..c17996e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c @@ -319,6 +319,12 @@ void gfxhub_v1_0_set_fault_enable_default(struct amdgpu_device *adev, WRITE_PROTECTION_FAULT_ENABLE_DEFAULT, value); tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL, EXECUTE_PROTECTION_FAULT_ENABLE_DEFAULT, value); + if (!value) { + tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL, + CRASH_ON_NO_RETRY_FAULT, 1); + tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL, + CRASH_ON_RETRY_FAULT, 1); +} WREG32_SOC15(GC, 0, mmVM_L2_PROTECTION_FAULT_CNTL, tmp); } diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c index 7ff7076..cc21c4b 100644 --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c @@ -561,6 +561,13 @@ void mmhub_v1_0_set_fault_enable_default(struct amdgpu_device *adev, bool value) WRITE_PROTECTION_FAULT_ENABLE_DEFAULT, value); tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL, EXECUTE_PROTECTION_FAULT_ENABLE_DEFAULT, value); + if (!value) { + tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL, + CRASH_ON_NO_RETRY_FAULT, 1); + tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL, + CRASH_ON_RETRY_FAULT, 1); +} + WREG32_SOC15(MMHUB, 0, mmVM_L2_PROTECTION_FAULT_CNTL, tmp); } -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV
From: Horace Chen Kernel will set the PCI power state to UNKNOWN after unloading, Since SRIOV has faked PCI config space so the UNKNOWN state will be kept forever. In driver reload if the power state is UNKNOWN then enabling msi will fail. forcely set it to D0 for SRIOV to fix this kernel flawness. Change-Id: I6a72d5fc9b653b21c3c98167515a511c5edeb91c Signed-off-by: Horace Chen Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c index 914c5bf..345406a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c @@ -229,7 +229,15 @@ int amdgpu_irq_init(struct amdgpu_device *adev) adev->irq.msi_enabled = false; if (amdgpu_msi_ok(adev)) { - int ret = pci_enable_msi(adev->pdev); + int ret; + if (amdgpu_sriov_vf(adev) && + adev->pdev->current_state == PCI_UNKNOWN){ + /* If pci power state is unknown on the SRIOV platform, +* it may be set in the remove device. We need to forcely +* set it to D0 to enable the msi*/ + adev->pdev->current_state = PCI_D0; + } + ret = pci_enable_msi(adev->pdev); if (!ret) { adev->irq.msi_enabled = true; dev_info(adev->dev, "amdgpu: using MSI.\n"); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9
Change-Id: I584572cfb9145ee1b8d11d69ba2989bd6acfd706 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 3306667..f201510 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -3499,6 +3499,17 @@ static void gfx_v9_0_ring_set_wptr_gfx(struct amdgpu_ring *ring) } } +static void gfx_v9_0_ring_emit_vgt_flush(struct amdgpu_ring *ring) +{ + amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0)); + amdgpu_ring_write(ring, EVENT_TYPE(VS_PARTIAL_FLUSH) | + EVENT_INDEX(4)); + + amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0)); + amdgpu_ring_write(ring, EVENT_TYPE(VGT_FLUSH) | + EVENT_INDEX(0)); +} + static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring) { u32 ref_and_mask, reg_mem_engine; @@ -3530,6 +3541,9 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring) nbio_hf_reg->hdp_flush_req_offset, nbio_hf_reg->hdp_flush_done_offset, ref_and_mask, ref_and_mask, 0x20); + + if (ring->funcs->type == AMDGPU_RING_TYPE_GFX) + gfx_v9_0_ring_emit_vgt_flush(ring); } static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring) -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 07/18] drm/amdgpu:add hdp golden setting register name hint
Change-Id: I3a43901f5757b9fab629824a74ad9a4770a47b38 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 7ca9cbe..7a20ba8 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -59,16 +59,16 @@ static const u32 golden_settings_vega10_hdp[] = { - 0xf64, 0x0fff, 0x, - 0xf65, 0x0fff, 0x, - 0xf66, 0x0fff, 0x, - 0xf67, 0x0fff, 0x, - 0xf68, 0x0fff, 0x, - 0xf6a, 0x0fff, 0x, - 0xf6b, 0x0fff, 0x, - 0xf6c, 0x0fff, 0x, - 0xf6d, 0x0fff, 0x, - 0xf6e, 0x0fff, 0x, + 0xf64, 0x0fff, 0x,//surface0_low_bound + 0xf65, 0x0fff, 0x,//surface0_upper_bound + 0xf66, 0x0fff, 0x,//surface0_base + 0xf67, 0x0fff, 0x,//surface0_info + 0xf68, 0x0fff, 0x,//surface0_base_hi + 0xf6a, 0x0fff, 0x,//surface1_low_bound + 0xf6b, 0x0fff, 0x,//surface1_upper_bound + 0xf6c, 0x0fff, 0x,//surface1_base + 0xf6d, 0x0fff, 0x,//surface1_info + 0xf6e, 0x0fff, 0x,//surface1_base_hi }; static int gmc_v9_0_vm_fault_interrupt_state(struct amdgpu_device *adev, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 03/18] drm/amdgpu/sriov:move in_reset to adev and rename
currently in_reset is only used in sriov gpu reset, and it will be used for other non-gfx hw component later, like PSP, so move it from gfx to adev and rename to in_sriov_reset make more sense. Change-Id: Ibb8546f6e4635a1cca740e57f6244f158c70a1e6 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 +++--- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a34c4cb..cc9a232 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1019,7 +1019,6 @@ struct amdgpu_gfx { /* reset mask */ uint32_tgrbm_soft_reset; uint32_tsrbm_soft_reset; - boolin_reset; /* s3/s4 mask */ boolin_suspend; /* NGG */ @@ -1588,6 +1587,7 @@ struct amdgpu_device { /* record last mm index being written through WREG32*/ unsigned long last_mm_index; + boolin_sriov_reset; }; static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3467179..298a241 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2757,7 +2757,7 @@ int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct amdgpu_job *job) mutex_lock(&adev->virt.lock_reset); atomic_inc(&adev->gpu_reset_counter); - adev->gfx.in_reset = true; + adev->in_sriov_reset = true; /* block TTM */ resched = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev); @@ -2868,7 +2868,7 @@ int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct amdgpu_job *job) dev_info(adev->dev, "GPU reset successed!\n"); } - adev->gfx.in_reset = false; + adev->in_sriov_reset = false; mutex_unlock(&adev->virt.lock_reset); return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index 6ee348e..3f511a9 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -4810,7 +4810,7 @@ static int gfx_v8_0_kiq_init_queue(struct amdgpu_ring *ring) gfx_v8_0_kiq_setting(ring); - if (adev->gfx.in_reset) { /* for GPU_RESET case */ + if (adev->in_sriov_reset) { /* for GPU_RESET case */ /* reset MQD to a clean status */ if (adev->gfx.mec.mqd_backup[mqd_idx]) memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation)); @@ -4847,7 +4847,7 @@ static int gfx_v8_0_kcq_init_queue(struct amdgpu_ring *ring) struct vi_mqd *mqd = ring->mqd_ptr; int mqd_idx = ring - &adev->gfx.compute_ring[0]; - if (!adev->gfx.in_reset && !adev->gfx.in_suspend) { + if (!adev->in_sriov_reset && !adev->gfx.in_suspend) { memset((void *)mqd, 0, sizeof(struct vi_mqd_allocation)); ((struct vi_mqd_allocation *)mqd)->dynamic_cu_mask = 0x; ((struct vi_mqd_allocation *)mqd)->dynamic_rb_mask = 0x; @@ -4859,7 +4859,7 @@ static int gfx_v8_0_kcq_init_queue(struct amdgpu_ring *ring) if (adev->gfx.mec.mqd_backup[mqd_idx]) memcpy(adev->gfx.mec.mqd_backup[mqd_idx], mqd, sizeof(struct vi_mqd_allocation)); - } else if (adev->gfx.in_reset) { /* for GPU_RESET case */ + } else if (adev->in_sriov_reset) { /* for GPU_RESET case */ /* reset MQD to a clean status */ if (adev->gfx.mec.mqd_backup[mqd_idx]) memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct vi_mqd_allocation)); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index c133c85..21838f4 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -2698,7 +2698,7 @@ static int gfx_v9_0_kiq_init_queue(struct amdgpu_ring *ring) gfx_v9_0_kiq_setting(ring); - if (adev->gfx.in_reset) { /* for GPU_RESET case */ + if (adev->in_sriov_reset) { /* for GPU_RESET case */ /* reset MQD to a clean status */ if (adev->gfx.mec.mqd_backup[mqd_idx]) memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], sizeof(struct v9_mqd_allocation)); @@ -2736,7 +2736,7 @@ static int gfx_v9_0_kcq_init_queue(struct amdgpu_ring *ring) struct v9_mqd *mqd = ring->mqd_ptr; int mqd_idx = ring - &adev->gfx.compute_ring[0]; - if (!adev->gfx.in_reset && !adev->gfx.in_suspend) { + if (!a
[PATCH 00/18] *** misc patches for SRIOV ***
found a lot of patches missed in 4.12 staging Horace Chen (2): drm/amdgpu: Fix amdgpu reload failure under SRIOV drm/amdgpu: increate mailbox polling timeout to 12s. Monk Liu (16): drm/amdgpu/sriov:fix missing error handling drm/amdgpu:no kiq in IH drm/amdgpu/sriov:move in_reset to adev and rename drm/amdgpu/sriov:don't load psp fw during gpu reset drm/amdgpu:make ctx_add_fence interruptible drm/amdgpu/sriov:fix memory leak after gpu reset drm/amdgpu:add hdp golden setting register name hint drm/amdgpu:halt when vm fault drm/amdgpu:insert TMZ_BEGIN drm/amdgpu:hdp flush should be put it initialized drm/amdgpu:add vgt_flush for gfx9 drm/amdgpu:use formal register to trigger hdp invalidate drm/amdgpu:fix driver unloading bug drm/amdgpu/sriov: fix page fault issue of driver unload drm/amdgpu:fix uvd ring fini routine drm/amdgpu/sriov:init csb for gfxv9 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 9 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c| 14 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c| 5 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 10 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 15 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 +- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c| 12 +++- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 100 + drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 6 ++ drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 32 - drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c| 7 ++ drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h | 2 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 +- 20 files changed, 226 insertions(+), 93 deletions(-) -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 18/18] drm/amdgpu/sriov:init csb for gfxv9
RLC need CSB registers initiated under SRIOV during world switch otherwise the clear state buffer behav will not be recovered to current VF scheme after switch back Change-Id: I3afd82875564c233060b740724bd8031095780f6 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index a577bbc..8d677cc 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -2044,8 +2044,10 @@ static int gfx_v9_0_rlc_resume(struct amdgpu_device *adev) { int r; - if (amdgpu_sriov_vf(adev)) + if (amdgpu_sriov_vf(adev)) { + gfx_v9_0_init_csb(adev); return 0; + } gfx_v9_0_rlc_stop(adev); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 13/18] drm/amdgpu:fix driver unloading bug
[SWDEV-126631] - fix hypervisor save_vf fail that occured after driver removed: 1. Because the KIQ and KCQ were not ummapped, save_vf will fail if driver freed mqd of KIQ and KCQ. 2. KIQ can't be unmapped since RLCV always need it, the bo_free on KIQ should be skipped 3. KCQ can be unmapped, and should be unmapped during hw_fini, 4. RLCV still need to access other mc address from some hw even after driver unloaded, So we should not unbind gart for VF. Change-Id: I320487a9a848f41484c5f8cc11be34aca807b424 Signed-off-by: Horace Chen Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 5 +++ drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 60 +++- 3 files changed, 66 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c index f437008..2fee071 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c @@ -394,7 +394,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev) */ void amdgpu_gart_fini(struct amdgpu_device *adev) { - if (adev->gart.ready) { + /* gart is still used by other hw under SRIOV, don't unbind it */ + if (adev->gart.ready && !amdgpu_sriov_vf(adev)) { /* unbind pages */ amdgpu_gart_unbind(adev, 0, adev->gart.num_cpu_pages); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c index 4f6c68f..bf6656f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c @@ -309,6 +309,11 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device *adev) &ring->mqd_ptr); } + /* don't deallocate KIQ mqd because the bo is still used by RLCV even + the guest VM is shutdown */ + if (amdgpu_sriov_vf(adev)) + return; + ring = &adev->gfx.kiq.ring; kfree(adev->gfx.mec.mqd_backup[AMDGPU_MAX_COMPUTE_RINGS]); amdgpu_bo_free_kernel(&ring->mqd_obj, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 44960b3..a577bbc 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -2892,14 +2892,72 @@ static int gfx_v9_0_hw_init(void *handle) return r; } +static int gfx_v9_0_kcq_disable(struct amdgpu_ring *kiq_ring,struct amdgpu_ring *ring) +{ + struct amdgpu_device *adev = kiq_ring->adev; + uint32_t scratch, tmp = 0; + int r, i; + + r = amdgpu_gfx_scratch_get(adev, &scratch); + if (r) { + DRM_ERROR("Failed to get scratch reg (%d).\n", r); + return r; + } + WREG32(scratch, 0xCAFEDEAD); + + r = amdgpu_ring_alloc(kiq_ring, 10); + if (r) { + DRM_ERROR("Failed to lock KIQ (%d).\n", r); + amdgpu_gfx_scratch_free(adev, scratch); + return r; + } + + /* unmap queues */ + amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_UNMAP_QUEUES, 4)); + amdgpu_ring_write(kiq_ring, /* Q_sel: 0, vmid: 0, engine: 0, num_Q: 1 */ + PACKET3_UNMAP_QUEUES_ACTION(1) | /* RESET_QUEUES */ + PACKET3_UNMAP_QUEUES_QUEUE_SEL(0) | + PACKET3_UNMAP_QUEUES_ENGINE_SEL(0) | + PACKET3_UNMAP_QUEUES_NUM_QUEUES(1)); + amdgpu_ring_write(kiq_ring, PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index)); + amdgpu_ring_write(kiq_ring, 0); + amdgpu_ring_write(kiq_ring, 0); + amdgpu_ring_write(kiq_ring, 0); + /* write to scratch for completion */ + amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1)); + amdgpu_ring_write(kiq_ring, (scratch - PACKET3_SET_UCONFIG_REG_START)); + amdgpu_ring_write(kiq_ring, 0xDEADBEEF); + amdgpu_ring_commit(kiq_ring); + + for (i = 0; i < adev->usec_timeout; i++) { + tmp = RREG32(scratch); + if (tmp == 0xDEADBEEF) + break; + DRM_UDELAY(1); + } + if (i >= adev->usec_timeout) { + DRM_ERROR("KCQ disabled failed (scratch(0x%04X)=0x%08X)\n", scratch, tmp); + r = -EINVAL; + } + amdgpu_gfx_scratch_free(adev, scratch); + return r; +} + + static int gfx_v9_0_hw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; + int i, r; amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0); amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0); if (amdgpu_sriov_vf(adev)) { - pr_debug("For SRIOV client, shouldn't do anything.\n"); + /* disable KCQ to avoid CPC touch memory not valid anymore */ +
[PATCH 10/18] drm/amdgpu:hdp flush should be put it initialized
Change-Id: I635271ba4c89189017daa302a7fe5cd65c3eef06 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 7a20ba8..3d035a6 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -696,12 +696,6 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev) if (r) return r; - /* After HDP is initialized, flush HDP.*/ - if (adev->flags & AMD_IS_APU) - nbio_v7_0_hdp_flush(adev); - else - nbio_v6_1_hdp_flush(adev); - switch (adev->asic_type) { case CHIP_RAVEN: mmhub_v1_0_initialize_power_gating(adev); @@ -724,6 +718,12 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev) tmp = RREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL); WREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL, tmp); + /* After HDP is initialized, flush HDP.*/ + if (adev->flags & AMD_IS_APU) + nbio_v7_0_hdp_flush(adev); + else + nbio_v6_1_hdp_flush(adev); + if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_ALWAYS) value = false; else -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate
Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index f201510..44960b3 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring) static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring) { gfx_v9_0_write_data_to_reg(ring, 0, true, - SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1); + SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE), 1); } static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring, diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c index fd7c72a..d5f3848 100644 --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c @@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring) { amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) | SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf)); - amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0)); + amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_READ_CACHE_INVALIDATE)); amdgpu_ring_write(ring, 1); } -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 15/18] drm/amdgpu/sriov: fix page fault issue of driver unload
bo_free on csa is too late to put in amdgpu_fini because that time ttm is already finished, Move it earlier to avoid the page fault. Change-Id: Id9c3f6aa8720cabbc9936ce21d8cf98af6e23bee Signed-off-by: Monk Liu Signed-off-by: Horace Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +--- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 1 + 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 298a241..e0a17bd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -1795,10 +1795,8 @@ static int amdgpu_fini(struct amdgpu_device *adev) adev->ip_blocks[i].status.late_initialized = false; } - if (amdgpu_sriov_vf(adev)) { - amdgpu_bo_free_kernel(&adev->virt.csa_obj, &adev->virt.csa_vmid0_addr, NULL); + if (amdgpu_sriov_vf(adev)) amdgpu_virt_release_full_gpu(adev, false); - } return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index 3f511a9..40e5865 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -2113,6 +2113,7 @@ static int gfx_v8_0_sw_fini(void *handle) amdgpu_gfx_compute_mqd_sw_fini(adev); amdgpu_gfx_kiq_free_ring(&adev->gfx.kiq.ring, &adev->gfx.kiq.irq); amdgpu_gfx_kiq_fini(adev); + amdgpu_bo_free_kernel(&adev->virt.csa_obj, &adev->virt.csa_vmid0_addr, NULL); gfx_v8_0_mec_fini(adev); gfx_v8_0_rlc_fini(adev); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 09/18] drm/amdgpu:insert TMZ_BEGIN
FRAME_CONTROL(begin) is needed for vega10 due to ucode logic change, it can fix some CTS random fail under gfx preemption enabled mode. Change-Id: I0442337f6cde13ed2a33f033badcb522e0f35e2d Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 21838f4..3306667 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c @@ -3764,6 +3764,12 @@ static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring) amdgpu_ring_write_multiple(ring, (void *)&de_payload, sizeof(de_payload) >> 2); } +static void gfx_v9_0_ring_emit_tmz(struct amdgpu_ring *ring, bool start) +{ + amdgpu_ring_write(ring, PACKET3(PACKET3_FRAME_CONTROL, 0)); + amdgpu_ring_write(ring, FRAME_CMD(start ? 0 : 1)); /* frame_end */ +} + static void gfx_v9_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags) { uint32_t dw2 = 0; @@ -3771,6 +3777,8 @@ static void gfx_v9_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags) if (amdgpu_sriov_vf(ring->adev)) gfx_v9_0_ring_emit_ce_meta(ring); + gfx_v9_0_ring_emit_tmz(ring, true); + dw2 |= 0x8000; /* set load_enable otherwise this package is just NOPs */ if (flags & AMDGPU_HAVE_CTX_SWITCH) { /* set load_global_config & load_global_uconfig */ @@ -3821,12 +3829,6 @@ static void gfx_v9_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne ring->ring[offset] = (ring->ring_size>>2) - offset + cur; } -static void gfx_v9_0_ring_emit_tmz(struct amdgpu_ring *ring, bool start) -{ - amdgpu_ring_write(ring, PACKET3(PACKET3_FRAME_CONTROL, 0)); - amdgpu_ring_write(ring, FRAME_CMD(start ? 0 : 1)); /* frame_end */ -} - static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg) { struct amdgpu_device *adev = ring->adev; -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 01/18] drm/amdgpu/sriov:fix missing error handling
Change-Id: Ifc6942ed0221f3134bfba4d66fde743484191da3 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index e390c01..d1ac27d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -841,8 +841,11 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv) if (amdgpu_sriov_vf(adev)) { r = amdgpu_map_static_csa(adev, &fpriv->vm, &fpriv->csa_va); - if (r) + if (r) { + amdgpu_vm_fini(adev, &fpriv->vm); + kfree(fpriv); goto out_suspend; + } } mutex_init(&fpriv->bo_list_lock); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset
doing gpu reset will rerun all hw_init and thus ucode_init_bo is invoked again, so we need to skip the fw_buf allocation during sriov gpu reset to avoid memory leak. Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 +++ 2 files changed, 35 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 6ff2959..3d0c633 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1185,6 +1185,9 @@ struct amdgpu_firmware { /* gpu info firmware data pointer */ const struct firmware *gpu_info_fw; + + void *fw_buf_ptr; + uint64_t fw_buf_mc; }; /* diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c index f306374..6564902 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c @@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct amdgpu_firmware_info *ucode, int amdgpu_ucode_init_bo(struct amdgpu_device *adev) { struct amdgpu_bo **bo = &adev->firmware.fw_buf; - uint64_t fw_mc_addr; - void *fw_buf_ptr = NULL; uint64_t fw_offset = 0; int i, err; struct amdgpu_firmware_info *ucode = NULL; @@ -372,37 +370,39 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev) return 0; } - err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true, - amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, - AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, - NULL, NULL, 0, bo); - if (err) { - dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err); - goto failed; - } + if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) { + err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true, + amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, + AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS, + NULL, NULL, 0, bo); + if (err) { + dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", err); + goto failed; + } - err = amdgpu_bo_reserve(*bo, false); - if (err) { - dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err); - goto failed_reserve; - } + err = amdgpu_bo_reserve(*bo, false); + if (err) { + dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", err); + goto failed_reserve; + } - err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, - &fw_mc_addr); - if (err) { - dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err); - goto failed_pin; - } + err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT, + &adev->firmware.fw_buf_mc); + if (err) { + dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err); + goto failed_pin; + } - err = amdgpu_bo_kmap(*bo, &fw_buf_ptr); - if (err) { - dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err); - goto failed_kmap; - } + err = amdgpu_bo_kmap(*bo, &adev->firmware.fw_buf_ptr); + if (err) { + dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err); + goto failed_kmap; + } - amdgpu_bo_unreserve(*bo); + amdgpu_bo_unreserve(*bo); + } - memset(fw_buf_ptr, 0, adev->firmware.fw_size); + memset(adev->firmware.fw_buf_ptr, 0, adev->firmware.fw_size); /* * if SMU loaded firmware, it needn't add SMC, UVD, and VCE @@ -421,14 +421,14 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev) ucode = &adev->firmware.ucode[i]; if (ucode->fw) { header = (const struct common_firmware_header *)ucode->fw->data; - amdgpu_ucode_init_single_fw(adev, ucode, fw_mc_addr + fw_offset, - (void *)((uint8_t *)fw_buf_ptr + fw_offset)); + amdgpu_ucode_init_single_fw(adev, ucode, adev->firmware.fw_buf_mc + fw_offset, + adev->firmware.fw_buf_ptr + fw
[PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset
At least for SRIOV we found reload PSP fw during gpu reset cause PSP hang. Change-Id: I5f273187a10bb8571b77651dfba7656ce0429af0 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index 8a1ee97..4eee2ef 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -253,15 +253,18 @@ static int psp_asd_load(struct psp_context *psp) static int psp_hw_start(struct psp_context *psp) { + struct amdgpu_device *adev = psp->adev; int ret; - ret = psp_bootloader_load_sysdrv(psp); - if (ret) - return ret; + if (amdgpu_sriov_vf(adev) && !adev->in_sriov_reset) { + ret = psp_bootloader_load_sysdrv(psp); + if (ret) + return ret; - ret = psp_bootloader_load_sos(psp); - if (ret) - return ret; + ret = psp_bootloader_load_sos(psp); + if (ret) + return ret; + } ret = psp_ring_create(psp, PSP_RING_TYPE__KM); if (ret) -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 05/18] drm/amdgpu:make ctx_add_fence interruptible
otherwise a gpu hang will make application couldn't be killed Change-Id: I6051b5b3ae1188983f49325a2438c84a6c12374a Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 14 +- 3 files changed, 21 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index cc9a232..6ff2959 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -736,8 +736,8 @@ struct amdgpu_ctx_mgr { struct amdgpu_ctx *amdgpu_ctx_get(struct amdgpu_fpriv *fpriv, uint32_t id); int amdgpu_ctx_put(struct amdgpu_ctx *ctx); -uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring, - struct dma_fence *fence); +int amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring, + struct dma_fence *fence, uint64_t *seq); struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring, uint64_t seq); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index b59749d..4ac7a92 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1043,6 +1043,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, struct amd_sched_entity *entity = &p->ctx->rings[ring->idx].entity; struct amdgpu_job *job; unsigned i; + uint64_t seq; + int r; amdgpu_mn_lock(p->mn); @@ -1071,8 +1073,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, job->owner = p->filp; job->fence_ctx = entity->fence_context; p->fence = dma_fence_get(&job->base.s_fence->finished); - cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring, p->fence); - job->uf_sequence = cs->out.handle; + r = amdgpu_ctx_add_fence(p->ctx, ring, p->fence, &seq); + if (r) { + dma_fence_put(p->fence); + return r; + } + + cs->out.handle = seq; + job->uf_sequence = seq; amdgpu_job_free_resources(job); trace_amdgpu_cs_ioctl(job); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index a11e443..97f8be4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -246,8 +246,8 @@ int amdgpu_ctx_put(struct amdgpu_ctx *ctx) return 0; } -uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring, - struct dma_fence *fence) +int amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring, + struct dma_fence *fence, uint64_t* handler) { struct amdgpu_ctx_ring *cring = & ctx->rings[ring->idx]; uint64_t seq = cring->sequence; @@ -258,9 +258,11 @@ uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring, other = cring->fences[idx]; if (other) { signed long r; - r = dma_fence_wait_timeout(other, false, MAX_SCHEDULE_TIMEOUT); - if (r < 0) + r = dma_fence_wait_timeout(other, true, MAX_SCHEDULE_TIMEOUT); + if (r < 0) { DRM_ERROR("Error (%ld) waiting for fence!\n", r); + return -ERESTARTSYS; + } } dma_fence_get(fence); @@ -271,8 +273,10 @@ uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring, spin_unlock(&ctx->ring_lock); dma_fence_put(other); + if (handler) + *handler = seq; - return seq; + return 0; } struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx, -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 02/18] drm/amdgpu:no kiq in IH
Change-Id: I4deb65675d2531236b2f4e2bc6f015c657546464 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 67610f7..c291e33 100644 --- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c @@ -219,9 +219,9 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev) wptr, adev->irq.ih.rptr, tmp); adev->irq.ih.rptr = tmp; - tmp = RREG32(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL)); + tmp = RREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL)); tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1); - WREG32(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp); + WREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp); } return (wptr & adev->irq.ih.ptr_mask); } -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 1/1] amdgpu: move asic id table to a separate file
looks fine to me, feel free to add my RB. Reviewed-by: Junwei Zhang BTW, we also has 1 or 2 patch to improve the name parsing. Please also take a look. Jerry On 05/11/2017 05:10 AM, Li, Samuel wrote: Also attach a sample ids file for reference. The names are from marketing, not related to source code and no reviews necessary here:) It can be put in directory /usr/share/libdrm. Sam -Original Message- From: Li, Samuel Sent: Wednesday, May 10, 2017 4:57 PM To: amd-gfx@lists.freedesktop.org Cc: Yuan, Xiaojie ; Li, Samuel Subject: [PATCH 1/1] amdgpu: move asic id table to a separate file From: Xiaojie Yuan Change-Id: I12216da14910f5e2b0970bc1fafc2a20b0ef1ba9 Signed-off-by: Samuel Li --- amdgpu/Makefile.am | 2 + amdgpu/Makefile.sources | 2 +- amdgpu/amdgpu_asic_id.c | 198 +++ amdgpu/amdgpu_asic_id.h | 165 --- amdgpu/amdgpu_device.c | 28 +-- amdgpu/amdgpu_internal.h | 10 +++ 6 files changed, 232 insertions(+), 173 deletions(-) create mode 100644 amdgpu/amdgpu_asic_id.c delete mode 100644 amdgpu/amdgpu_asic_id.h diff --git a/amdgpu/Makefile.am b/amdgpu/Makefile.am index cf7bc1b..ecf9e82 100644 --- a/amdgpu/Makefile.am +++ b/amdgpu/Makefile.am @@ -30,6 +30,8 @@ AM_CFLAGS = \ $(PTHREADSTUBS_CFLAGS) \ -I$(top_srcdir)/include/drm +AM_CPPFLAGS = -DAMDGPU_ASIC_ID_TABLE=\"${datadir}/libdrm/amdgpu.ids\" + libdrm_amdgpu_la_LTLIBRARIES = libdrm_amdgpu.la libdrm_amdgpu_ladir = $(libdir) libdrm_amdgpu_la_LDFLAGS = -version-number 1:0:0 -no-undefined diff --git a/amdgpu/Makefile.sources b/amdgpu/Makefile.sources index 487b9e0..bc3abaa 100644 --- a/amdgpu/Makefile.sources +++ b/amdgpu/Makefile.sources @@ -1,5 +1,5 @@ LIBDRM_AMDGPU_FILES := \ - amdgpu_asic_id.h \ + amdgpu_asic_id.c \ amdgpu_bo.c \ amdgpu_cs.c \ amdgpu_device.c \ diff --git a/amdgpu/amdgpu_asic_id.c b/amdgpu/amdgpu_asic_id.c new file mode 100644 index 000..d50e21a --- /dev/null +++ b/amdgpu/amdgpu_asic_id.c @@ -0,0 +1,198 @@ +/* + * Copyright © 2017 Advanced Micro Devices, Inc. + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include +#include +#include +#include +#include +#include + +#include "amdgpu_drm.h" +#include "amdgpu_internal.h" + +static int parse_one_line(const char *line, struct amdgpu_asic_id *id) +{ + char *buf; + char *s_did; + char *s_rid; + char *s_name; + char *endptr; + int r = 0; + + buf = strdup(line); + if (!buf) + return -ENOMEM; + + /* ignore empty line and commented line */ + if (strlen(line) == 0 || line[0] == '#') { + r = -EAGAIN; + goto out; + } + + /* device id */ + s_did = strtok(buf, ","); + if (!s_did) { + r = -EINVAL; + goto out; + } + + id->did = strtol(s_did, &endptr, 16); + if (*endptr) { + r = -EINVAL; + goto out; + } + + /* revision id */ + s_rid = strtok(NULL, ","); + if (!s_rid) { + r = -EINVAL; + goto out; + } + + id->rid = strtol(s_rid, &endptr, 16); + if (*endptr) { + r = -EINVAL; + goto out; + } + + /* marketing name */ + s_name = strtok(NULL, ","); + if (!s_name) { + r = -EINVAL; + goto out; + } + + id->marketing_name = strdup(s_name); + if (id->marketing_name == NULL) { + r = -EINVAL; + goto out; + } + +out: + free(buf); + + return r; +} + +int amdgpu_parse_asic_ids(struct amdgpu_asic_id **p_asic_id_table) +{ + struct amdgpu_asic_id *asic_id_table; + struct amdgpu_asic_i
Re: [PATCH] drm/amdgpu/psp: declare raven psp firmware
On 09/16/2017 05:37 AM, Alex Deucher wrote: So it gets picked up properly by the kernel. Signed-off-by: Alex Deucher Reviewed-by: Junwei Zhang --- drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c b/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c index 6ec5c9f..77cab1f 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c @@ -35,6 +35,8 @@ #include "raven1/GC/gc_9_1_offset.h" #include "raven1/SDMA0/sdma0_4_1_offset.h" +MODULE_FIRMWARE("amdgpu/raven_asd.bin"); + static int psp_v10_0_get_fw_type(struct amdgpu_firmware_info *ucode, enum psp_gfx_fw_type *type) { ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdkfd: check for null dev to avoid a null pointer dereference
On Fri, Sep 8, 2017 at 5:13 PM, Colin King wrote: > From: Colin Ian King > > The call to kfd_device_by_id can potentially return null, so check that > dev is null and return with -EINVAL to avoid a null pointer dereference. > > Detected by CoverityScan CID#1454629 ("Dereference null return value") > > Fixes: 5d71dbc3a588 ("drm/amdkfd: Implement image tiling mode support v2") > Signed-off-by: Colin Ian King > --- > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > index e4a8c2e52cb2..660b3fbade41 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > @@ -892,6 +892,8 @@ static int kfd_ioctl_get_tile_config(struct file *filep, > int err = 0; > > dev = kfd_device_by_id(args->gpu_id); > + if (!dev) > + return -EINVAL; > > dev->kfd2kgd->get_tile_config(dev->kgd, &config); > > -- > 2.14.1 > Thanks! Applied to my -fixes tree ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 10/11] drm/amdkfd: Print event limit messages only once per process
On Sat, Sep 16, 2017 at 2:43 AM, Felix Kuehling wrote: > To avoid spamming the log. > > Signed-off-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 - > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 + > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c > b/drivers/gpu/drm/amd/amdkfd/kfd_events.c > index 5979158..944abfa 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c > @@ -292,7 +292,10 @@ static int create_signal_event(struct file *devkfd, > struct kfd_event *ev) > { > if (p->signal_event_count == KFD_SIGNAL_EVENT_LIMIT) { > - pr_warn("Signal event wasn't created because limit was > reached\n"); > + if (!p->signal_event_limit_reached) { > + pr_warn("Signal event wasn't created because limit > was reached\n"); > + p->signal_event_limit_reached = true; > + } > return -ENOMEM; > } > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > index bb71697..a546d01 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > @@ -532,6 +532,7 @@ struct kfd_process { > struct list_head signal_event_pages; > u32 next_nonsignal_event_id; > size_t signal_event_count; > + bool signal_event_limit_reached; > }; > > /** > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx This patch is: Reviewed-by: Oded Gabbay ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 09/11] drm/amdkfd: Fix kernel-queue wrapping bugs
On Sat, Sep 16, 2017 at 2:43 AM, Felix Kuehling wrote: > From: Yong Zhao > > Avoid intermediate negative numbers when doing calculations with a mix > of signed and unsigned variables where implicit conversions can lead > to unexpected results. > > When kernel queue buffer wraps around to 0, we need to check that rptr > won't be overwritten by the new packet. > > Signed-off-by: Yong Zhao > Signed-off-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 18 +++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > index 9ebb4c1..1c66334 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > @@ -210,6 +210,11 @@ static int acquire_packet_buffer(struct kernel_queue *kq, > uint32_t wptr, rptr; > unsigned int *queue_address; > > + /* When rptr == wptr, the buffer is empty. Start comment text in a new line. First line should be just /* > +* When rptr == wptr + 1, the buffer is full. > +* It is always rptr that advances to the position of wptr, rather > than > +* the opposite. So we can only use up to queue_size_dwords - 1 > dwords. > +*/ > rptr = *kq->rptr_kernel; > wptr = *kq->wptr_kernel; > queue_address = (unsigned int *)kq->pq_kernel_addr; > @@ -219,11 +224,10 @@ static int acquire_packet_buffer(struct kernel_queue > *kq, > pr_debug("wptr: %d\n", wptr); > pr_debug("queue_address 0x%p\n", queue_address); > > - available_size = (rptr - 1 - wptr + queue_size_dwords) % > + available_size = (rptr + queue_size_dwords - 1 - wptr) % > queue_size_dwords; > > - if (packet_size_in_dwords >= queue_size_dwords || > - packet_size_in_dwords >= available_size) { > + if (packet_size_in_dwords > available_size) { > /* > * make sure calling functions know > * acquire_packet_buffer() failed > @@ -233,6 +237,14 @@ static int acquire_packet_buffer(struct kernel_queue *kq, > } > > if (wptr + packet_size_in_dwords >= queue_size_dwords) { > + /* make sure after rolling back to position 0, there is > +* still enough space. > +*/ > + if (packet_size_in_dwords >= rptr) { > + *buffer_ptr = NULL; > + return -ENOMEM; > + } I don't think the condition is correct. Suppose, queue_size_dwords == 100, wptr == rptr == 50 (queue is empty) and we have a new packet with size of 70. Now, wptr + size is 120, which is >= 100 However, 70 >= rptr (50) which will give us -ENOMEM, but this is not correct condition, because the packet *does* have enough room in the queue. I think the condition should be: if (packet_size_in_dwords - (queue_size_dwords - wptr) >= rptr) but please check this. > + /* fill nops, roll back and start at position 0 */ > while (wptr > 0) { > queue_address[wptr] = kq->nop_packet; > wptr = (wptr + 1) % queue_size_dwords; > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 06/11] drm/amdkfd: Use VMID bitmap from KGD
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > From: Yong Zhao > > The hard-coded values related to VMID were removed in KFD, as those > values can be calculated in the KFD initialization function. > > Signed-off-by: Yong Zhao > Signed-off-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c| 9 ++--- > drivers/gpu/drm/amd/amdkfd/kfd_device.c| 7 +++ > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 13 ++--- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 4 > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +++ > drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 2 +- > 6 files changed, 23 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c > b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c > index 0aa021a..7d5635f 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c > @@ -769,13 +769,8 @@ int dbgdev_wave_reset_wavefronts(struct kfd_dev *dev, > struct kfd_process *p) > union GRBM_GFX_INDEX_BITS reg_gfx_index; > struct kfd_process_device *pdd; > struct dbg_wave_control_info wac_info; > - int temp; > - int first_vmid_to_scan = 8; > - int last_vmid_to_scan = 15; > - > - first_vmid_to_scan = ffs(dev->shared_resources.compute_vmid_bitmap) - > 1; > - temp = dev->shared_resources.compute_vmid_bitmap >> > first_vmid_to_scan; > - last_vmid_to_scan = first_vmid_to_scan + ffz(temp); > + int first_vmid_to_scan = dev->vm_info.first_vmid_kfd; > + int last_vmid_to_scan = dev->vm_info.last_vmid_kfd; > > reg_sq_cmd.u32All = 0; > status = 0; > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > index ff3f97c..abf91b0 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > @@ -223,9 +223,16 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, > const struct kgd2kfd_shared_resources *gpu_resources) > { > unsigned int size; > + unsigned int vmid_bitmap_kfd; > > kfd->shared_resources = *gpu_resources; > > + vmid_bitmap_kfd = kfd->shared_resources.compute_vmid_bitmap; Unnecessary copy, just use kfd->shared_resources.compute_vmid_bitmap in the below lines. If you want a shorter name, use a pointer. > + kfd->vm_info.first_vmid_kfd = ffs(vmid_bitmap_kfd) - 1; > + kfd->vm_info.last_vmid_kfd = fls(vmid_bitmap_kfd) - 1; > + kfd->vm_info.vmid_num_kfd = kfd->vm_info.last_vmid_kfd > + - kfd->vm_info.first_vmid_kfd + 1; > + > /* calculate max size of mqds needed for queues */ > size = max_num_of_queues_per_device * > kfd->device_info->mqd_size_aligned; > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 5da7ef4..897ff083 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -113,11 +113,11 @@ static int allocate_vmid(struct device_queue_manager > *dqm, > if (dqm->vmid_bitmap == 0) > return -ENOMEM; > > - bit = find_first_bit((unsigned long *)&dqm->vmid_bitmap, > CIK_VMID_NUM); > + bit = find_first_bit((unsigned long *)&dqm->vmid_bitmap, > + dqm->dev->vm_info.vmid_num_kfd); > clear_bit(bit, (unsigned long *)&dqm->vmid_bitmap); > > - /* Kaveri kfd vmid's starts from vmid 8 */ > - allocated_vmid = bit + KFD_VMID_START_OFFSET; > + allocated_vmid = bit + dqm->dev->vm_info.first_vmid_kfd; > pr_debug("vmid allocation %d\n", allocated_vmid); > qpd->vmid = allocated_vmid; > q->properties.vmid = allocated_vmid; > @@ -132,7 +132,7 @@ static void deallocate_vmid(struct device_queue_manager > *dqm, > struct qcm_process_device *qpd, > struct queue *q) > { > - int bit = qpd->vmid - KFD_VMID_START_OFFSET; > + int bit = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd; > > /* Release the vmid mapping */ > set_pasid_vmid_mapping(dqm, 0, qpd->vmid); > @@ -507,7 +507,7 @@ static int initialize_nocpsch(struct device_queue_manager > *dqm) > dqm->allocated_queues[pipe] |= 1 << queue; > } > > - dqm->vmid_bitmap = (1 << VMID_PER_DEVICE) - 1; > + dqm->vmid_bitmap = (1 << dqm->dev->vm_info.vmid_num_kfd) - 1; > dqm->sdma_bitmap = (1 << CIK_SDMA_QUEUES) - 1; > > return 0; > @@ -613,8 +613,7 @@ static int set_sched_resources(struct > device_queue_manager *dqm) > int i, mec; > struct scheduling_resources res; > > - res.vmid_mask = (1 << VMID_PER_DEVICE) - 1; > - res.vmid_mask <<= KFD_VM
Re: [PATCH 08/11] drm/amdkfd: Drop _nocpsch suffix from shared functions
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > From: Yong Zhao > > Several functions in DQM are shared between cpsch and nocpsch code. > Remove the misleading _nocpsch suffix from their names. > > Signed-off-by: Yong Zhao > Signed-off-by: Felix Kuehling > --- > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 24 > +++--- > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 0ecea67..169e061 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -386,7 +386,7 @@ static int update_queue(struct device_queue_manager *dqm, > struct queue *q) > return retval; > } > > -static struct mqd_manager *get_mqd_manager_nocpsch( > +static struct mqd_manager *get_mqd_manager( > struct device_queue_manager *dqm, enum KFD_MQD_TYPE type) > { > struct mqd_manager *mqd; > @@ -407,7 +407,7 @@ static struct mqd_manager *get_mqd_manager_nocpsch( > return mqd; > } > > -static int register_process_nocpsch(struct device_queue_manager *dqm, > +static int register_process(struct device_queue_manager *dqm, > struct qcm_process_device *qpd) > { > struct device_process_node *n; > @@ -431,7 +431,7 @@ static int register_process_nocpsch(struct > device_queue_manager *dqm, > return retval; > } > > -static int unregister_process_nocpsch(struct device_queue_manager *dqm, > +static int unregister_process(struct device_queue_manager *dqm, > struct qcm_process_device *qpd) > { > int retval; > @@ -513,7 +513,7 @@ static int initialize_nocpsch(struct device_queue_manager > *dqm) > return 0; > } > > -static void uninitialize_nocpsch(struct device_queue_manager *dqm) > +static void uninitialize(struct device_queue_manager *dqm) > { > int i; > > @@ -1097,10 +1097,10 @@ struct device_queue_manager > *device_queue_manager_init(struct kfd_dev *dev) > dqm->ops.stop = stop_cpsch; > dqm->ops.destroy_queue = destroy_queue_cpsch; > dqm->ops.update_queue = update_queue; > - dqm->ops.get_mqd_manager = get_mqd_manager_nocpsch; > - dqm->ops.register_process = register_process_nocpsch; > - dqm->ops.unregister_process = unregister_process_nocpsch; > - dqm->ops.uninitialize = uninitialize_nocpsch; > + dqm->ops.get_mqd_manager = get_mqd_manager; > + dqm->ops.register_process = register_process; > + dqm->ops.unregister_process = unregister_process; > + dqm->ops.uninitialize = uninitialize; > dqm->ops.create_kernel_queue = create_kernel_queue_cpsch; > dqm->ops.destroy_kernel_queue = destroy_kernel_queue_cpsch; > dqm->ops.set_cache_memory_policy = set_cache_memory_policy; > @@ -1112,11 +1112,11 @@ struct device_queue_manager > *device_queue_manager_init(struct kfd_dev *dev) > dqm->ops.create_queue = create_queue_nocpsch; > dqm->ops.destroy_queue = destroy_queue_nocpsch; > dqm->ops.update_queue = update_queue; > - dqm->ops.get_mqd_manager = get_mqd_manager_nocpsch; > - dqm->ops.register_process = register_process_nocpsch; > - dqm->ops.unregister_process = unregister_process_nocpsch; > + dqm->ops.get_mqd_manager = get_mqd_manager; > + dqm->ops.register_process = register_process; > + dqm->ops.unregister_process = unregister_process; > dqm->ops.initialize = initialize_nocpsch; > - dqm->ops.uninitialize = uninitialize_nocpsch; > + dqm->ops.uninitialize = uninitialize; > dqm->ops.set_cache_memory_policy = set_cache_memory_policy; > break; > default: > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx This patch is: Reviewed-by: Oded Gabbay ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 05/11] drm/amdkfd: Fix incorrect destroy_mqd parameter
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > When uninitializing a kernel queue. > > Signed-off-by: Yong Zhao > Signed-off-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > index 0c82446..09356d0 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > @@ -184,7 +184,7 @@ static void uninitialize(struct kernel_queue *kq) > if (kq->queue->properties.type == KFD_QUEUE_TYPE_HIQ) > kq->mqd->destroy_mqd(kq->mqd, > NULL, > - false, > + KFD_PREEMPT_TYPE_WAVEFRONT_RESET, > KFD_UNMAP_LATENCY_MS, > kq->queue->pipe, > kq->queue->queue); > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx This patch is: Reviewed-by: Oded Gabbay ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 04/11] drm/amdkfd: Adjust dequeue latencies and timeouts
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > Adjust latencies and timeouts for dequeueing with HWS and consolidate > them in one place. Make them longer to allow long running waves to > complete without causing a timeout. The timeout is twice as long as the > latency plus some buffer to make sure we don't detect a timeout > prematurely. > > Change timeouts for dequeueing HQDs without HWS. KFD_UNMAP_LATENCY is > more consistent with what the HWS does for user queues. > > Signed-off-by: Yong Zhao > Signed-off-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 4 +++- > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 +- > drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c | 2 +- > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 --- > 5 files changed, 6 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 3db6a31..5da7ef4 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -323,7 +323,7 @@ static int destroy_queue_nocpsch(struct > device_queue_manager *dqm, > > retval = mqd->destroy_mqd(mqd, q->mqd, > KFD_PREEMPT_TYPE_WAVEFRONT_RESET, > - QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS, > + KFD_UNMAP_LATENCY_MS, > q->pipe, q->queue); > > if (retval) > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h > index faf820a..99e2305 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h > @@ -29,7 +29,9 @@ > #include "kfd_priv.h" > #include "kfd_mqd_manager.h" > > -#define QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS (500) > +#define KFD_UNMAP_LATENCY_MS (4000) > +#define QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS (2 * KFD_UNMAP_LATENCY_MS + 1000) > + > #define CIK_VMID_NUM (8) > #define KFD_VMID_START_OFFSET (8) > #define VMID_PER_DEVICECIK_VMID_NUM > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > index 681b639..0c82446 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > @@ -185,7 +185,7 @@ static void uninitialize(struct kernel_queue *kq) > kq->mqd->destroy_mqd(kq->mqd, > NULL, > false, > - QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS, > + KFD_UNMAP_LATENCY_MS, > kq->queue->pipe, > kq->queue->queue); > else if (kq->queue->properties.type == KFD_QUEUE_TYPE_DIQ) > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c > index 1d31260..9eda884 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c > @@ -376,7 +376,7 @@ int pm_send_set_resources(struct packet_manager *pm, > packet->bitfields2.queue_type = > > queue_type__mes_set_resources__hsa_interface_queue_hiq; > packet->bitfields2.vmid_mask = res->vmid_mask; > - packet->bitfields2.unmap_latency = KFD_UNMAP_LATENCY; > + packet->bitfields2.unmap_latency = KFD_UNMAP_LATENCY_MS / 100; > packet->bitfields7.oac_mask = res->oac_mask; > packet->bitfields8.gds_heap_base = res->gds_heap_base; > packet->bitfields8.gds_heap_size = res->gds_heap_size; > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > index f8d6a8e..099dc33 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > @@ -673,11 +673,8 @@ int amdkfd_fence_wait_timeout(unsigned int *fence_addr, > > /* Packet Manager */ > > -#define KFD_HIQ_TIMEOUT (500) > - > #define KFD_FENCE_COMPLETED (100) > #define KFD_FENCE_INIT (10) > -#define KFD_UNMAP_LATENCY (150) > > struct packet_manager { > struct device_queue_manager *dqm; > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx This patch is: Reviewed-by: Oded Gabbay ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 02/11] drm/amdkfd: Fix suspend/resume issue on Carrizo
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > From: Yong Zhao > > When we do suspend/resume through "sudo pm-suspend" while there is > HSA activity running, upon resume we will encounter HWS hanging, which > is caused by memory read/write failures. The root cause is that when > suspend, we neglected to unbind pasid from kfd device. > > Another major change is that the bind/unbinding is changed to be > performed on a per process basis, instead of whether there are queues > in dqm. > > Signed-off-by: Yong Zhao > Signed-off-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_device.c| 22 -- > .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 13 > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 15 +++- > drivers/gpu/drm/amd/amdkfd/kfd_process.c | 89 > ++ > 4 files changed, 101 insertions(+), 38 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > index cc8af11..ff3f97c 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > @@ -191,7 +191,7 @@ static void iommu_pasid_shutdown_callback(struct pci_dev > *pdev, int pasid) > struct kfd_dev *dev = kfd_device_by_pci_dev(pdev); > > if (dev) > - kfd_unbind_process_from_device(dev, pasid); > + kfd_process_iommu_unbind_callback(dev, pasid); > } > > /* > @@ -339,12 +339,16 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd) > > void kgd2kfd_suspend(struct kfd_dev *kfd) > { > - if (kfd->init_complete) { > - kfd->dqm->ops.stop(kfd->dqm); > - amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL); > - amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL); > - amd_iommu_free_device(kfd->pdev); > - } > + if (!kfd->init_complete) > + return; > + > + kfd->dqm->ops.stop(kfd->dqm); > + > + kfd_unbind_processes_from_device(kfd); > + > + amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL); > + amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL); > + amd_iommu_free_device(kfd->pdev); > } > > int kgd2kfd_resume(struct kfd_dev *kfd) > @@ -369,6 +373,10 @@ static int kfd_resume(struct kfd_dev *kfd) > amd_iommu_set_invalid_ppr_cb(kfd->pdev, > iommu_invalid_ppr_cb); > > + err = kfd_bind_processes_to_device(kfd); > + if (err) > + return -ENXIO; You need to undo previous initialization in case kfd_bind_processes_to_device fails, i.e. call amd_iommu_free_device() > + > err = kfd->dqm->ops.start(kfd->dqm); > if (err) { > dev_err(kfd_device, > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 53a66e8..5db82b8 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -670,7 +670,6 @@ static int initialize_cpsch(struct device_queue_manager > *dqm) > > static int start_cpsch(struct device_queue_manager *dqm) > { > - struct device_process_node *node; > int retval; > > retval = 0; > @@ -697,11 +696,6 @@ static int start_cpsch(struct device_queue_manager *dqm) > > init_interrupts(dqm); > > - list_for_each_entry(node, &dqm->queues, list) > - if (node->qpd->pqm->process && dqm->dev) > - kfd_bind_process_to_device(dqm->dev, > - node->qpd->pqm->process); > - > execute_queues_cpsch(dqm, true); > > return 0; > @@ -714,15 +708,8 @@ static int start_cpsch(struct device_queue_manager *dqm) > > static int stop_cpsch(struct device_queue_manager *dqm) > { > - struct device_process_node *node; > - struct kfd_process_device *pdd; > - > destroy_queues_cpsch(dqm, true, true); > > - list_for_each_entry(node, &dqm->queues, list) { > - pdd = qpd_to_pdd(node->qpd); > - pdd->bound = false; > - } > kfd_gtt_sa_free(dqm->dev, dqm->fence_mem); > pm_uninit(&dqm->packets); > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > index b397ec7..ef582cc 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > @@ -435,6 +435,13 @@ struct qcm_process_device { > uint32_t sh_hidden_private_base; > }; > > + > +enum kfd_pdd_bound { > + PDD_UNBOUND = 0, > + PDD_BOUND, > + PDD_BOUND_SUSPENDED, > +}; > + > /* Data that is per-process-per device. */ > struct kfd_process_device { > /* > @@ -459,7 +466,7 @@ struct kfd_process_device { > uint64_t scratch_limit; > > /* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) > */ > - bool bound; > + enum kfd_pdd_bound boun
Re: [PATCH 01/11] drm/amdkfd: Reorganize kfd resume code
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > From: Yong Zhao > > The idea is to let kfd init and resume function share the same code path > as much as possible, rather than to have two copies of almost identical > code. That way improves the code readability and maintainability. > > Signed-off-by: Yong Zhao > Signed-off-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_device.c | 78 > + > 1 file changed, 40 insertions(+), 38 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > index 61fff25..cc8af11 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > @@ -92,6 +92,8 @@ static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned > int buf_size, > unsigned int chunk_size); > static void kfd_gtt_sa_fini(struct kfd_dev *kfd); > > +static int kfd_resume(struct kfd_dev *kfd); > + > static const struct kfd_device_info *lookup_device_info(unsigned short did) > { > size_t i; > @@ -176,15 +178,8 @@ static bool device_iommu_pasid_init(struct kfd_dev *kfd) > pasid_limit, > kfd->doorbell_process_limit - 1); > > - err = amd_iommu_init_device(kfd->pdev, pasid_limit); > - if (err < 0) { > - dev_err(kfd_device, "error initializing iommu device\n"); > - return false; > - } > - > if (!kfd_set_pasid_limit(pasid_limit)) { > dev_err(kfd_device, "error setting pasid limit\n"); > - amd_iommu_free_device(kfd->pdev); > return false; > } > > @@ -280,29 +275,22 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, > goto kfd_interrupt_error; > } > > - if (!device_iommu_pasid_init(kfd)) { > - dev_err(kfd_device, > - "Error initializing iommuv2 for device %x:%x\n", > - kfd->pdev->vendor, kfd->pdev->device); > - goto device_iommu_pasid_error; > - } > - amd_iommu_set_invalidate_ctx_cb(kfd->pdev, > - > iommu_pasid_shutdown_callback); > - amd_iommu_set_invalid_ppr_cb(kfd->pdev, iommu_invalid_ppr_cb); > - > kfd->dqm = device_queue_manager_init(kfd); > if (!kfd->dqm) { > dev_err(kfd_device, "Error initializing queue manager\n"); > goto device_queue_manager_error; > } > > - if (kfd->dqm->ops.start(kfd->dqm)) { > + if (!device_iommu_pasid_init(kfd)) { > dev_err(kfd_device, > - "Error starting queue manager for device %x:%x\n", > + "Error initializing iommuv2 for device %x:%x\n", > kfd->pdev->vendor, kfd->pdev->device); > - goto dqm_start_error; > + goto device_iommu_pasid_error; > } > > + if (kfd_resume(kfd)) > + goto kfd_resume_error; > + > kfd->dbgmgr = NULL; > > kfd->init_complete = true; > @@ -314,11 +302,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, > > goto out; > > -dqm_start_error: > +kfd_resume_error: > +device_iommu_pasid_error: > device_queue_manager_uninit(kfd->dqm); > device_queue_manager_error: > - amd_iommu_free_device(kfd->pdev); > -device_iommu_pasid_error: > kfd_interrupt_exit(kfd); > kfd_interrupt_error: > kfd_topology_remove_device(kfd); > @@ -338,8 +325,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, > void kgd2kfd_device_exit(struct kfd_dev *kfd) > { > if (kfd->init_complete) { > + kgd2kfd_suspend(kfd); > device_queue_manager_uninit(kfd->dqm); > - amd_iommu_free_device(kfd->pdev); > kfd_interrupt_exit(kfd); > kfd_topology_remove_device(kfd); > kfd_doorbell_fini(kfd); > @@ -362,25 +349,40 @@ void kgd2kfd_suspend(struct kfd_dev *kfd) > > int kgd2kfd_resume(struct kfd_dev *kfd) > { > - unsigned int pasid_limit; > - int err; > + if (!kfd->init_complete) > + return 0; > > - pasid_limit = kfd_get_pasid_limit(); > + return kfd_resume(kfd); > > - if (kfd->init_complete) { > - err = amd_iommu_init_device(kfd->pdev, pasid_limit); > - if (err < 0) { > - dev_err(kfd_device, "failed to initialize iommu\n"); > - return -ENXIO; > - } > +} > > - amd_iommu_set_invalidate_ctx_cb(kfd->pdev, > - > iommu_pasid_shutdown_callback); > - amd_iommu_set_invalid_ppr_cb(kfd->pdev, iommu_invalid_ppr_cb); > - kfd->dqm->ops.start(kfd->dqm); > +static int kfd_resume(struct kfd_dev *kfd) > +{ > + int err = 0; > +
Re: [PATCH 07/11] drm/amdkfd: Reuse CHIP_* from amdgpu
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > From: Yong Zhao > > There are already CHIP_* definitions under amd_shared.h file on amdgpu > side, so KFD should reuse them rather than defining new ones. > > Using enum for asic type requires default cases on switch statements > to prevent compiler warnings. BUG on unsupported ASICs. It should never > get there because KFD should not be initialized on unsupported devices. We did an effort to remove all BUG statements from the driver so please don't introduce new ones. Even if the code should never reach there, it is not a reason to crash the entire kernel as it doesn't effect the rest of the system's functionality. Oded. > > Signed-off-by: Yong Zhao > Signed-off-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 ++ > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 ++ > drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 2 ++ > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 9 +++-- > 4 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 897ff083..0ecea67 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -1132,6 +1132,8 @@ struct device_queue_manager > *device_queue_manager_init(struct kfd_dev *dev) > case CHIP_KAVERI: > device_queue_manager_init_cik(&dqm->ops_asic_specific); > break; > + default: > + BUG(); Replace this with some error printing. > } > > if (!dqm->ops.initialize(dqm)) > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > index 09356d0..9ebb4c1 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c > @@ -291,6 +291,8 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev > *dev, > case CHIP_KAVERI: > kernel_queue_init_cik(&kq->ops_asic_specific); > break; > + default: > + BUG(); Replace this with some error printing. > } > > if (!kq->ops.initialize(kq, dev, type, KFD_KERNEL_QUEUE_SIZE)) { > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c > index b1ef136..b5a87ba 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c > @@ -31,6 +31,8 @@ struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type, > return mqd_manager_init_cik(type, dev); > case CHIP_CARRIZO: > return mqd_manager_init_vi(type, dev); > + default: > + BUG(); Replace this with some error printing. > } > > return NULL; > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > index 7bed4ef..bb71697 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h > @@ -33,6 +33,8 @@ > #include > #include > > +#include "amd_shared.h" > + > #define KFD_SYSFS_FILE_MODE 0444 > > #define KFD_MMAP_DOORBELL_MASK 0x8 > @@ -112,11 +114,6 @@ enum cache_policy { > cache_policy_noncoherent > }; > > -enum asic_family_type { > - CHIP_KAVERI = 0, > - CHIP_CARRIZO > -}; > - > struct kfd_event_interrupt_class { > bool (*interrupt_isr)(struct kfd_dev *dev, > const uint32_t *ih_ring_entry); > @@ -125,7 +122,7 @@ struct kfd_event_interrupt_class { > }; > > struct kfd_device_info { > - unsigned int asic_family; > + enum amd_asic_type asic_family; > const struct kfd_event_interrupt_class *event_interrupt_class; > unsigned int max_pasid_bits; > unsigned int max_no_of_hqd; > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 11/11] drm/amdkfd: Set /dev/kfd permissions to 0666 by default
On Sat, Sep 16, 2017 at 2:43 AM, Felix Kuehling wrote: > From: Andres Rodriguez > > Set the default permissions of /dev/kfd to be more than just root > accessible 600. > I don't think that's acceptable. You need to use udev rules file for that. Oded > Signed-off-by: Andres Rodriguez > Reviewed-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > index e4a8c2e..1ad9901 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c > @@ -55,6 +55,14 @@ static int kfd_char_dev_major = -1; > static struct class *kfd_class; > struct device *kfd_device; > > +static char *kfd_devnode(struct device *dev, umode_t *mode) > +{ > + if (mode && dev->devt == MKDEV(kfd_char_dev_major, 0)) > + *mode = 0666; > + > + return NULL; > +} > + > int kfd_chardev_init(void) > { > int err = 0; > @@ -69,6 +77,8 @@ int kfd_chardev_init(void) > if (IS_ERR(kfd_class)) > goto err_class_create; > > + kfd_class->devnode = kfd_devnode; > + > kfd_device = device_create(kfd_class, NULL, > MKDEV(kfd_char_dev_major, 0), > NULL, kfd_dev_name); > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx