[PATCH 16/18] drm/amdgpu: increate mailbox polling timeout to 12s.

2017-09-17 Thread Monk Liu
From: Horace Chen 

Because there may have multiple FLR waiting for done, the waiting
time of events may be long, add the time to 12s to reduce timeout
failure.

Change-Id: I6b33170ba7dedf781b99ba6095127efce403af81
Signed-off-by: Horace Chen 
---
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h | 2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
index 1e91b9a..67e7857 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
@@ -24,7 +24,7 @@
 #ifndef __MXGPU_AI_H__
 #define __MXGPU_AI_H__
 
-#define AI_MAILBOX_TIMEDOUT5000
+#define AI_MAILBOX_TIMEDOUT12000
 
 enum idh_request {
IDH_REQ_GPU_INIT_ACCESS = 1,
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
index c791d73..f13dc6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
@@ -23,7 +23,7 @@
 #ifndef __MXGPU_VI_H__
 #define __MXGPU_VI_H__
 
-#define VI_MAILBOX_TIMEDOUT5000
+#define VI_MAILBOX_TIMEDOUT12000
 #define VI_MAILBOX_RESET_TIME  12
 
 /* VI mailbox messages request */
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 17/18] drm/amdgpu:fix uvd ring fini routine

2017-09-17 Thread Monk Liu
fix missing finish uvd enc_ring and wrongly finish uvd ring

Change-Id: Ib74237ca5adcb3b128c9b751fced0b7db7b09e86
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 331e34a..63b00eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -269,6 +269,8 @@ int amdgpu_uvd_sw_init(struct amdgpu_device *adev)
 
 int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
 {
+   struct amdgpu_ring *ring;
+   int i;
kfree(adev->uvd.saved_bo);
 
amd_sched_entity_fini(&adev->uvd.ring.sched, &adev->uvd.entity);
@@ -277,7 +279,15 @@ int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
  &adev->uvd.gpu_addr,
  (void **)&adev->uvd.cpu_addr);
 
-   amdgpu_ring_fini(&adev->uvd.ring);
+   ring = &adev->uvd.ring;
+   if (ring->adev)
+   amdgpu_ring_fini(ring);
+
+   for (i = 0; i < AMDGPU_MAX_UVD_ENC_RINGS; ++i) {
+   ring = &adev->uvd.ring_enc[i];
+   if (ring->adev)
+   amdgpu_ring_fini(ring);
+   }
 
release_firmware(adev->uvd.fw);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 08/18] drm/amdgpu:halt when vm fault

2017-09-17 Thread Monk Liu
only with this way we can debug the VMC page fault issue

Change-Id: Ifc8373c3c3c40d54ae94dedf1be74d6314faeb10
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 6 ++
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 7 +++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 6c8040e..c17996e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -319,6 +319,12 @@ void gfxhub_v1_0_set_fault_enable_default(struct 
amdgpu_device *adev,
WRITE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
EXECUTE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
+   if (!value) {
+   tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
+   CRASH_ON_NO_RETRY_FAULT, 1);
+   tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
+   CRASH_ON_RETRY_FAULT, 1);
+}
WREG32_SOC15(GC, 0, mmVM_L2_PROTECTION_FAULT_CNTL, tmp);
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
index 7ff7076..cc21c4b 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -561,6 +561,13 @@ void mmhub_v1_0_set_fault_enable_default(struct 
amdgpu_device *adev, bool value)
WRITE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
EXECUTE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
+   if (!value) {
+   tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
+   CRASH_ON_NO_RETRY_FAULT, 1);
+   tmp = REG_SET_FIELD(tmp, VM_L2_PROTECTION_FAULT_CNTL,
+   CRASH_ON_RETRY_FAULT, 1);
+}
+
WREG32_SOC15(MMHUB, 0, mmVM_L2_PROTECTION_FAULT_CNTL, tmp);
 }
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV

2017-09-17 Thread Monk Liu
From: Horace Chen 

Kernel will set the PCI power state to UNKNOWN after unloading,
Since SRIOV has faked PCI config space so the UNKNOWN state
will be kept forever.

In driver reload if the power state is UNKNOWN then enabling msi
will fail.

forcely set it to D0 for SRIOV to fix this kernel flawness.

Change-Id: I6a72d5fc9b653b21c3c98167515a511c5edeb91c
Signed-off-by: Horace Chen 
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 914c5bf..345406a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -229,7 +229,15 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
adev->irq.msi_enabled = false;
 
if (amdgpu_msi_ok(adev)) {
-   int ret = pci_enable_msi(adev->pdev);
+   int ret;
+   if (amdgpu_sriov_vf(adev) &&
+   adev->pdev->current_state == PCI_UNKNOWN){
+   /* If pci power state is unknown on the SRIOV platform,
+* it may be set in the remove device. We need to 
forcely
+* set it to D0 to enable the msi*/
+   adev->pdev->current_state = PCI_D0;
+   }
+   ret = pci_enable_msi(adev->pdev);
if (!ret) {
adev->irq.msi_enabled = true;
dev_info(adev->dev, "amdgpu: using MSI.\n");
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9

2017-09-17 Thread Monk Liu
Change-Id: I584572cfb9145ee1b8d11d69ba2989bd6acfd706
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 3306667..f201510 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3499,6 +3499,17 @@ static void gfx_v9_0_ring_set_wptr_gfx(struct 
amdgpu_ring *ring)
}
 }
 
+static void gfx_v9_0_ring_emit_vgt_flush(struct amdgpu_ring *ring)
+{
+   amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
+   amdgpu_ring_write(ring, EVENT_TYPE(VS_PARTIAL_FLUSH) |
+   EVENT_INDEX(4));
+
+   amdgpu_ring_write(ring, PACKET3(PACKET3_EVENT_WRITE, 0));
+   amdgpu_ring_write(ring, EVENT_TYPE(VGT_FLUSH) |
+   EVENT_INDEX(0));
+}
+
 static void gfx_v9_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
 {
u32 ref_and_mask, reg_mem_engine;
@@ -3530,6 +3541,9 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct 
amdgpu_ring *ring)
  nbio_hf_reg->hdp_flush_req_offset,
  nbio_hf_reg->hdp_flush_done_offset,
  ref_and_mask, ref_and_mask, 0x20);
+
+   if (ring->funcs->type == AMDGPU_RING_TYPE_GFX)
+   gfx_v9_0_ring_emit_vgt_flush(ring);
 }
 
 static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 07/18] drm/amdgpu:add hdp golden setting register name hint

2017-09-17 Thread Monk Liu
Change-Id: I3a43901f5757b9fab629824a74ad9a4770a47b38
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 7ca9cbe..7a20ba8 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -59,16 +59,16 @@
 
 static const u32 golden_settings_vega10_hdp[] =
 {
-   0xf64, 0x0fff, 0x,
-   0xf65, 0x0fff, 0x,
-   0xf66, 0x0fff, 0x,
-   0xf67, 0x0fff, 0x,
-   0xf68, 0x0fff, 0x,
-   0xf6a, 0x0fff, 0x,
-   0xf6b, 0x0fff, 0x,
-   0xf6c, 0x0fff, 0x,
-   0xf6d, 0x0fff, 0x,
-   0xf6e, 0x0fff, 0x,
+   0xf64, 0x0fff, 0x,//surface0_low_bound
+   0xf65, 0x0fff, 0x,//surface0_upper_bound
+   0xf66, 0x0fff, 0x,//surface0_base
+   0xf67, 0x0fff, 0x,//surface0_info
+   0xf68, 0x0fff, 0x,//surface0_base_hi
+   0xf6a, 0x0fff, 0x,//surface1_low_bound
+   0xf6b, 0x0fff, 0x,//surface1_upper_bound
+   0xf6c, 0x0fff, 0x,//surface1_base
+   0xf6d, 0x0fff, 0x,//surface1_info
+   0xf6e, 0x0fff, 0x,//surface1_base_hi
 };
 
 static int gmc_v9_0_vm_fault_interrupt_state(struct amdgpu_device *adev,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 03/18] drm/amdgpu/sriov:move in_reset to adev and rename

2017-09-17 Thread Monk Liu
currently in_reset is only used in sriov gpu reset, and it
will be used for other non-gfx hw component later, like
PSP, so move it from gfx to adev and rename to in_sriov_reset
make more sense.

Change-Id: Ibb8546f6e4635a1cca740e57f6244f158c70a1e6
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 6 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 6 +++---
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a34c4cb..cc9a232 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1019,7 +1019,6 @@ struct amdgpu_gfx {
/* reset mask */
uint32_tgrbm_soft_reset;
uint32_tsrbm_soft_reset;
-   boolin_reset;
/* s3/s4 mask */
boolin_suspend;
/* NGG */
@@ -1588,6 +1587,7 @@ struct amdgpu_device {
 
/* record last mm index being written through WREG32*/
unsigned long last_mm_index;
+   boolin_sriov_reset;
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3467179..298a241 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2757,7 +2757,7 @@ int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, 
struct amdgpu_job *job)
 
mutex_lock(&adev->virt.lock_reset);
atomic_inc(&adev->gpu_reset_counter);
-   adev->gfx.in_reset = true;
+   adev->in_sriov_reset = true;
 
/* block TTM */
resched = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
@@ -2868,7 +2868,7 @@ int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, 
struct amdgpu_job *job)
dev_info(adev->dev, "GPU reset successed!\n");
}
 
-   adev->gfx.in_reset = false;
+   adev->in_sriov_reset = false;
mutex_unlock(&adev->virt.lock_reset);
return r;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 6ee348e..3f511a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4810,7 +4810,7 @@ static int gfx_v8_0_kiq_init_queue(struct amdgpu_ring 
*ring)
 
gfx_v8_0_kiq_setting(ring);
 
-   if (adev->gfx.in_reset) { /* for GPU_RESET case */
+   if (adev->in_sriov_reset) { /* for GPU_RESET case */
/* reset MQD to a clean status */
if (adev->gfx.mec.mqd_backup[mqd_idx])
memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], 
sizeof(struct vi_mqd_allocation));
@@ -4847,7 +4847,7 @@ static int gfx_v8_0_kcq_init_queue(struct amdgpu_ring 
*ring)
struct vi_mqd *mqd = ring->mqd_ptr;
int mqd_idx = ring - &adev->gfx.compute_ring[0];
 
-   if (!adev->gfx.in_reset && !adev->gfx.in_suspend) {
+   if (!adev->in_sriov_reset && !adev->gfx.in_suspend) {
memset((void *)mqd, 0, sizeof(struct vi_mqd_allocation));
((struct vi_mqd_allocation *)mqd)->dynamic_cu_mask = 0x;
((struct vi_mqd_allocation *)mqd)->dynamic_rb_mask = 0x;
@@ -4859,7 +4859,7 @@ static int gfx_v8_0_kcq_init_queue(struct amdgpu_ring 
*ring)
 
if (adev->gfx.mec.mqd_backup[mqd_idx])
memcpy(adev->gfx.mec.mqd_backup[mqd_idx], mqd, 
sizeof(struct vi_mqd_allocation));
-   } else if (adev->gfx.in_reset) { /* for GPU_RESET case */
+   } else if (adev->in_sriov_reset) { /* for GPU_RESET case */
/* reset MQD to a clean status */
if (adev->gfx.mec.mqd_backup[mqd_idx])
memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], 
sizeof(struct vi_mqd_allocation));
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index c133c85..21838f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2698,7 +2698,7 @@ static int gfx_v9_0_kiq_init_queue(struct amdgpu_ring 
*ring)
 
gfx_v9_0_kiq_setting(ring);
 
-   if (adev->gfx.in_reset) { /* for GPU_RESET case */
+   if (adev->in_sriov_reset) { /* for GPU_RESET case */
/* reset MQD to a clean status */
if (adev->gfx.mec.mqd_backup[mqd_idx])
memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], 
sizeof(struct v9_mqd_allocation));
@@ -2736,7 +2736,7 @@ static int gfx_v9_0_kcq_init_queue(struct amdgpu_ring 
*ring)
struct v9_mqd *mqd = ring->mqd_ptr;
int mqd_idx = ring - &adev->gfx.compute_ring[0];
 
-   if (!adev->gfx.in_reset && !adev->gfx.in_suspend) {
+   if (!a

[PATCH 00/18] *** misc patches for SRIOV ***

2017-09-17 Thread Monk Liu
found a lot of patches missed in 4.12 staging

Horace Chen (2):
  drm/amdgpu: Fix amdgpu reload failure under SRIOV
  drm/amdgpu: increate mailbox polling timeout to 12s.

Monk Liu (16):
  drm/amdgpu/sriov:fix missing error handling
  drm/amdgpu:no kiq in IH
  drm/amdgpu/sriov:move in_reset to adev and rename
  drm/amdgpu/sriov:don't load psp fw during gpu reset
  drm/amdgpu:make ctx_add_fence interruptible
  drm/amdgpu/sriov:fix memory leak after gpu reset
  drm/amdgpu:add hdp golden setting register name hint
  drm/amdgpu:halt when vm fault
  drm/amdgpu:insert TMZ_BEGIN
  drm/amdgpu:hdp flush should be put it initialized
  drm/amdgpu:add vgt_flush for gfx9
  drm/amdgpu:use formal register to trigger hdp invalidate
  drm/amdgpu:fix driver unloading bug
  drm/amdgpu/sriov: fix page fault issue of driver unload
  drm/amdgpu:fix uvd ring fini routine
  drm/amdgpu/sriov:init csb for gfxv9

 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   9 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  12 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c|  14 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   8 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c|   5 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c|  10 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c|  15 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c  |  64 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c|  12 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |   7 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 100 +
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c   |   6 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  32 -
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c|   7 ++
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h  |   2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h  |   2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |   2 +-
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c |   4 +-
 20 files changed, 226 insertions(+), 93 deletions(-)

-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 18/18] drm/amdgpu/sriov:init csb for gfxv9

2017-09-17 Thread Monk Liu
RLC need CSB registers initiated under SRIOV during world switch
otherwise the clear state buffer behav will not be recovered to
current VF scheme after switch back

Change-Id: I3afd82875564c233060b740724bd8031095780f6
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index a577bbc..8d677cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2044,8 +2044,10 @@ static int gfx_v9_0_rlc_resume(struct amdgpu_device 
*adev)
 {
int r;
 
-   if (amdgpu_sriov_vf(adev))
+   if (amdgpu_sriov_vf(adev)) {
+   gfx_v9_0_init_csb(adev);
return 0;
+   }
 
gfx_v9_0_rlc_stop(adev);
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 13/18] drm/amdgpu:fix driver unloading bug

2017-09-17 Thread Monk Liu
[SWDEV-126631] - fix hypervisor save_vf fail that occured
after driver removed:
1. Because the KIQ and KCQ were not ummapped, save_vf will fail if driver freed 
mqd of KIQ and KCQ.
2. KIQ can't be unmapped since RLCV always need it, the bo_free on KIQ should 
be skipped
3. KCQ can be unmapped, and should be unmapped during hw_fini,
4. RLCV still need to access other mc address from some hw even after driver 
unloaded,
   So we should not unbind gart for VF.

Change-Id: I320487a9a848f41484c5f8cc11be34aca807b424
Signed-off-by: Horace Chen 
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c  |  5 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 60 +++-
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index f437008..2fee071 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -394,7 +394,8 @@ int amdgpu_gart_init(struct amdgpu_device *adev)
  */
 void amdgpu_gart_fini(struct amdgpu_device *adev)
 {
-   if (adev->gart.ready) {
+   /* gart is still used by other hw under SRIOV, don't unbind it */
+   if (adev->gart.ready && !amdgpu_sriov_vf(adev)) {
/* unbind pages */
amdgpu_gart_unbind(adev, 0, adev->gart.num_cpu_pages);
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 4f6c68f..bf6656f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -309,6 +309,11 @@ void amdgpu_gfx_compute_mqd_sw_fini(struct amdgpu_device 
*adev)
  &ring->mqd_ptr);
}
 
+   /* don't deallocate KIQ mqd because the bo is still used by RLCV even
+   the guest VM is shutdown */
+   if (amdgpu_sriov_vf(adev))
+   return;
+
ring = &adev->gfx.kiq.ring;
kfree(adev->gfx.mec.mqd_backup[AMDGPU_MAX_COMPUTE_RINGS]);
amdgpu_bo_free_kernel(&ring->mqd_obj,
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 44960b3..a577bbc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -2892,14 +2892,72 @@ static int gfx_v9_0_hw_init(void *handle)
return r;
 }
 
+static int gfx_v9_0_kcq_disable(struct amdgpu_ring *kiq_ring,struct 
amdgpu_ring *ring)
+{
+   struct amdgpu_device *adev = kiq_ring->adev;
+   uint32_t scratch, tmp = 0;
+   int r, i;
+
+   r = amdgpu_gfx_scratch_get(adev, &scratch);
+   if (r) {
+   DRM_ERROR("Failed to get scratch reg (%d).\n", r);
+   return r;
+   }
+   WREG32(scratch, 0xCAFEDEAD);
+
+   r = amdgpu_ring_alloc(kiq_ring, 10);
+   if (r) {
+   DRM_ERROR("Failed to lock KIQ (%d).\n", r);
+   amdgpu_gfx_scratch_free(adev, scratch);
+   return r;
+   }
+
+   /* unmap queues */
+   amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_UNMAP_QUEUES, 4));
+   amdgpu_ring_write(kiq_ring, /* Q_sel: 0, vmid: 0, engine: 0, num_Q: 1 */
+   PACKET3_UNMAP_QUEUES_ACTION(1) 
| /* RESET_QUEUES */
+   
PACKET3_UNMAP_QUEUES_QUEUE_SEL(0) |
+   
PACKET3_UNMAP_QUEUES_ENGINE_SEL(0) |
+   
PACKET3_UNMAP_QUEUES_NUM_QUEUES(1));
+   amdgpu_ring_write(kiq_ring, 
PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
+   amdgpu_ring_write(kiq_ring, 0);
+   amdgpu_ring_write(kiq_ring, 0);
+   amdgpu_ring_write(kiq_ring, 0);
+   /* write to scratch for completion */
+   amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_UCONFIG_REG, 1));
+   amdgpu_ring_write(kiq_ring, (scratch - PACKET3_SET_UCONFIG_REG_START));
+   amdgpu_ring_write(kiq_ring, 0xDEADBEEF);
+   amdgpu_ring_commit(kiq_ring);
+
+   for (i = 0; i < adev->usec_timeout; i++) {
+   tmp = RREG32(scratch);
+   if (tmp == 0xDEADBEEF)
+   break;
+   DRM_UDELAY(1);
+   }
+   if (i >= adev->usec_timeout) {
+   DRM_ERROR("KCQ disabled failed (scratch(0x%04X)=0x%08X)\n", 
scratch, tmp);
+   r = -EINVAL;
+   }
+   amdgpu_gfx_scratch_free(adev, scratch);
+   return r;
+}
+
+
 static int gfx_v9_0_hw_fini(void *handle)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   int i, r;
 
amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
if (amdgpu_sriov_vf(adev)) {
-   pr_debug("For SRIOV client, shouldn't do anything.\n");
+   /* disable KCQ to avoid CPC touch memory not valid anymore */
+ 

[PATCH 10/18] drm/amdgpu:hdp flush should be put it initialized

2017-09-17 Thread Monk Liu
Change-Id: I635271ba4c89189017daa302a7fe5cd65c3eef06
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 7a20ba8..3d035a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -696,12 +696,6 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev)
if (r)
return r;
 
-   /* After HDP is initialized, flush HDP.*/
-   if (adev->flags & AMD_IS_APU)
-   nbio_v7_0_hdp_flush(adev);
-   else
-   nbio_v6_1_hdp_flush(adev);
-
switch (adev->asic_type) {
case CHIP_RAVEN:
mmhub_v1_0_initialize_power_gating(adev);
@@ -724,6 +718,12 @@ static int gmc_v9_0_gart_enable(struct amdgpu_device *adev)
tmp = RREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL);
WREG32_SOC15(HDP, 0, mmHDP_HOST_PATH_CNTL, tmp);
 
+   /* After HDP is initialized, flush HDP.*/
+   if (adev->flags & AMD_IS_APU)
+   nbio_v7_0_hdp_flush(adev);
+   else
+   nbio_v6_1_hdp_flush(adev);
+
if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_ALWAYS)
value = false;
else
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate

2017-09-17 Thread Monk Liu
Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index f201510..44960b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3549,7 +3549,7 @@ static void gfx_v9_0_ring_emit_hdp_flush(struct 
amdgpu_ring *ring)
 static void gfx_v9_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
 {
gfx_v9_0_write_data_to_reg(ring, 0, true,
-  SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0), 1);
+  SOC15_REG_OFFSET(HDP, 0, 
mmHDP_READ_CACHE_INVALIDATE), 1);
 }
 
 static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index fd7c72a..d5f3848 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -398,7 +398,7 @@ static void sdma_v4_0_ring_emit_hdp_invalidate(struct 
amdgpu_ring *ring)
 {
amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_SRBM_WRITE) |
  SDMA_PKT_SRBM_WRITE_HEADER_BYTE_EN(0xf));
-   amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, mmHDP_DEBUG0));
+   amdgpu_ring_write(ring, SOC15_REG_OFFSET(HDP, 0, 
mmHDP_READ_CACHE_INVALIDATE));
amdgpu_ring_write(ring, 1);
 }
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 15/18] drm/amdgpu/sriov: fix page fault issue of driver unload

2017-09-17 Thread Monk Liu
bo_free on csa is too late to put in amdgpu_fini because that
time ttm is already finished,
Move it earlier to avoid the page fault.

Change-Id: Id9c3f6aa8720cabbc9936ce21d8cf98af6e23bee
Signed-off-by: Monk Liu 
Signed-off-by: Horace Chen 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 1 +
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 298a241..e0a17bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1795,10 +1795,8 @@ static int amdgpu_fini(struct amdgpu_device *adev)
adev->ip_blocks[i].status.late_initialized = false;
}
 
-   if (amdgpu_sriov_vf(adev)) {
-   amdgpu_bo_free_kernel(&adev->virt.csa_obj, 
&adev->virt.csa_vmid0_addr, NULL);
+   if (amdgpu_sriov_vf(adev))
amdgpu_virt_release_full_gpu(adev, false);
-   }
 
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 3f511a9..40e5865 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -2113,6 +2113,7 @@ static int gfx_v8_0_sw_fini(void *handle)
amdgpu_gfx_compute_mqd_sw_fini(adev);
amdgpu_gfx_kiq_free_ring(&adev->gfx.kiq.ring, &adev->gfx.kiq.irq);
amdgpu_gfx_kiq_fini(adev);
+   amdgpu_bo_free_kernel(&adev->virt.csa_obj, &adev->virt.csa_vmid0_addr, 
NULL);
 
gfx_v8_0_mec_fini(adev);
gfx_v8_0_rlc_fini(adev);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 09/18] drm/amdgpu:insert TMZ_BEGIN

2017-09-17 Thread Monk Liu
FRAME_CONTROL(begin) is needed for vega10 due to ucode logic change,
it can fix some CTS random fail under gfx preemption enabled mode.

Change-Id: I0442337f6cde13ed2a33f033badcb522e0f35e2d
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 21838f4..3306667 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3764,6 +3764,12 @@ static void gfx_v9_0_ring_emit_de_meta(struct 
amdgpu_ring *ring)
amdgpu_ring_write_multiple(ring, (void *)&de_payload, 
sizeof(de_payload) >> 2);
 }
 
+static void gfx_v9_0_ring_emit_tmz(struct amdgpu_ring *ring, bool start)
+{
+   amdgpu_ring_write(ring, PACKET3(PACKET3_FRAME_CONTROL, 0));
+   amdgpu_ring_write(ring, FRAME_CMD(start ? 0 : 1)); /* frame_end */
+}
+
 static void gfx_v9_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags)
 {
uint32_t dw2 = 0;
@@ -3771,6 +3777,8 @@ static void gfx_v9_ring_emit_cntxcntl(struct amdgpu_ring 
*ring, uint32_t flags)
if (amdgpu_sriov_vf(ring->adev))
gfx_v9_0_ring_emit_ce_meta(ring);
 
+   gfx_v9_0_ring_emit_tmz(ring, true);
+
dw2 |= 0x8000; /* set load_enable otherwise this package is just 
NOPs */
if (flags & AMDGPU_HAVE_CTX_SWITCH) {
/* set load_global_config & load_global_uconfig */
@@ -3821,12 +3829,6 @@ static void gfx_v9_0_ring_emit_patch_cond_exec(struct 
amdgpu_ring *ring, unsigne
ring->ring[offset] = (ring->ring_size>>2) - offset + cur;
 }
 
-static void gfx_v9_0_ring_emit_tmz(struct amdgpu_ring *ring, bool start)
-{
-   amdgpu_ring_write(ring, PACKET3(PACKET3_FRAME_CONTROL, 0));
-   amdgpu_ring_write(ring, FRAME_CMD(start ? 0 : 1)); /* frame_end */
-}
-
 static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)
 {
struct amdgpu_device *adev = ring->adev;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 01/18] drm/amdgpu/sriov:fix missing error handling

2017-09-17 Thread Monk Liu
Change-Id: Ifc6942ed0221f3134bfba4d66fde743484191da3
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index e390c01..d1ac27d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -841,8 +841,11 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
 
if (amdgpu_sriov_vf(adev)) {
r = amdgpu_map_static_csa(adev, &fpriv->vm, &fpriv->csa_va);
-   if (r)
+   if (r) {
+   amdgpu_vm_fini(adev, &fpriv->vm);
+   kfree(fpriv);
goto out_suspend;
+   }
}
 
mutex_init(&fpriv->bo_list_lock);
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset

2017-09-17 Thread Monk Liu
doing gpu reset will rerun all hw_init and thus
ucode_init_bo is invoked again, so we need to skip
the fw_buf allocation during sriov gpu reset to avoid
memory leak.

Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 64 +++
 2 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6ff2959..3d0c633 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1185,6 +1185,9 @@ struct amdgpu_firmware {
 
/* gpu info firmware data pointer */
const struct firmware *gpu_info_fw;
+
+   void *fw_buf_ptr;
+   uint64_t fw_buf_mc;
 };
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index f306374..6564902 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -360,8 +360,6 @@ static int amdgpu_ucode_patch_jt(struct 
amdgpu_firmware_info *ucode,
 int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
 {
struct amdgpu_bo **bo = &adev->firmware.fw_buf;
-   uint64_t fw_mc_addr;
-   void *fw_buf_ptr = NULL;
uint64_t fw_offset = 0;
int i, err;
struct amdgpu_firmware_info *ucode = NULL;
@@ -372,37 +370,39 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
return 0;
}
 
-   err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, true,
-   amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM 
: AMDGPU_GEM_DOMAIN_GTT,
-   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
-   NULL, NULL, 0, bo);
-   if (err) {
-   dev_err(adev->dev, "(%d) Firmware buffer allocate failed\n", 
err);
-   goto failed;
-   }
+   if (!amdgpu_sriov_vf(adev) || !adev->in_sriov_reset) {
+   err = amdgpu_bo_create(adev, adev->firmware.fw_size, PAGE_SIZE, 
true,
+   amdgpu_sriov_vf(adev) ? 
AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
+   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS,
+   NULL, NULL, 0, bo);
+   if (err) {
+   dev_err(adev->dev, "(%d) Firmware buffer allocate 
failed\n", err);
+   goto failed;
+   }
 
-   err = amdgpu_bo_reserve(*bo, false);
-   if (err) {
-   dev_err(adev->dev, "(%d) Firmware buffer reserve failed\n", 
err);
-   goto failed_reserve;
-   }
+   err = amdgpu_bo_reserve(*bo, false);
+   if (err) {
+   dev_err(adev->dev, "(%d) Firmware buffer reserve 
failed\n", err);
+   goto failed_reserve;
+   }
 
-   err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? AMDGPU_GEM_DOMAIN_VRAM 
: AMDGPU_GEM_DOMAIN_GTT,
-   &fw_mc_addr);
-   if (err) {
-   dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", err);
-   goto failed_pin;
-   }
+   err = amdgpu_bo_pin(*bo, amdgpu_sriov_vf(adev) ? 
AMDGPU_GEM_DOMAIN_VRAM : AMDGPU_GEM_DOMAIN_GTT,
+   &adev->firmware.fw_buf_mc);
+   if (err) {
+   dev_err(adev->dev, "(%d) Firmware buffer pin failed\n", 
err);
+   goto failed_pin;
+   }
 
-   err = amdgpu_bo_kmap(*bo, &fw_buf_ptr);
-   if (err) {
-   dev_err(adev->dev, "(%d) Firmware buffer kmap failed\n", err);
-   goto failed_kmap;
-   }
+   err = amdgpu_bo_kmap(*bo, &adev->firmware.fw_buf_ptr);
+   if (err) {
+   dev_err(adev->dev, "(%d) Firmware buffer kmap 
failed\n", err);
+   goto failed_kmap;
+   }
 
-   amdgpu_bo_unreserve(*bo);
+   amdgpu_bo_unreserve(*bo);
+   }
 
-   memset(fw_buf_ptr, 0, adev->firmware.fw_size);
+   memset(adev->firmware.fw_buf_ptr, 0, adev->firmware.fw_size);
 
/*
 * if SMU loaded firmware, it needn't add SMC, UVD, and VCE
@@ -421,14 +421,14 @@ int amdgpu_ucode_init_bo(struct amdgpu_device *adev)
ucode = &adev->firmware.ucode[i];
if (ucode->fw) {
header = (const struct common_firmware_header 
*)ucode->fw->data;
-   amdgpu_ucode_init_single_fw(adev, ucode, fw_mc_addr + 
fw_offset,
-   (void *)((uint8_t 
*)fw_buf_ptr + fw_offset));
+   amdgpu_ucode_init_single_fw(adev, ucode, 
adev->firmware.fw_buf_mc + fw_offset,
+   adev->firmware.fw_buf_ptr + 
fw

[PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset

2017-09-17 Thread Monk Liu
At least for SRIOV we found reload PSP fw during
gpu reset cause PSP hang.

Change-Id: I5f273187a10bb8571b77651dfba7656ce0429af0
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 8a1ee97..4eee2ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -253,15 +253,18 @@ static int psp_asd_load(struct psp_context *psp)
 
 static int psp_hw_start(struct psp_context *psp)
 {
+   struct amdgpu_device *adev = psp->adev;
int ret;
 
-   ret = psp_bootloader_load_sysdrv(psp);
-   if (ret)
-   return ret;
+   if (amdgpu_sriov_vf(adev) && !adev->in_sriov_reset) {
+   ret = psp_bootloader_load_sysdrv(psp);
+   if (ret)
+   return ret;
 
-   ret = psp_bootloader_load_sos(psp);
-   if (ret)
-   return ret;
+   ret = psp_bootloader_load_sos(psp);
+   if (ret)
+   return ret;
+   }
 
ret = psp_ring_create(psp, PSP_RING_TYPE__KM);
if (ret)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 05/18] drm/amdgpu:make ctx_add_fence interruptible

2017-09-17 Thread Monk Liu
otherwise a gpu hang will make application couldn't be killed

Change-Id: I6051b5b3ae1188983f49325a2438c84a6c12374a
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 12 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 14 +-
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index cc9a232..6ff2959 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -736,8 +736,8 @@ struct amdgpu_ctx_mgr {
 struct amdgpu_ctx *amdgpu_ctx_get(struct amdgpu_fpriv *fpriv, uint32_t id);
 int amdgpu_ctx_put(struct amdgpu_ctx *ctx);
 
-uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
- struct dma_fence *fence);
+int amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
+ struct dma_fence *fence, uint64_t *seq);
 struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,
   struct amdgpu_ring *ring, uint64_t seq);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index b59749d..4ac7a92 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1043,6 +1043,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
struct amd_sched_entity *entity = &p->ctx->rings[ring->idx].entity;
struct amdgpu_job *job;
unsigned i;
+   uint64_t seq;
+
int r;
 
amdgpu_mn_lock(p->mn);
@@ -1071,8 +1073,14 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
job->owner = p->filp;
job->fence_ctx = entity->fence_context;
p->fence = dma_fence_get(&job->base.s_fence->finished);
-   cs->out.handle = amdgpu_ctx_add_fence(p->ctx, ring, p->fence);
-   job->uf_sequence = cs->out.handle;
+   r = amdgpu_ctx_add_fence(p->ctx, ring, p->fence, &seq);
+   if (r) {
+   dma_fence_put(p->fence);
+   return r;
+   }
+
+   cs->out.handle = seq;
+   job->uf_sequence = seq;
amdgpu_job_free_resources(job);
 
trace_amdgpu_cs_ioctl(job);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index a11e443..97f8be4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -246,8 +246,8 @@ int amdgpu_ctx_put(struct amdgpu_ctx *ctx)
return 0;
 }
 
-uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
- struct dma_fence *fence)
+int amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
+ struct dma_fence *fence, uint64_t* handler)
 {
struct amdgpu_ctx_ring *cring = & ctx->rings[ring->idx];
uint64_t seq = cring->sequence;
@@ -258,9 +258,11 @@ uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, 
struct amdgpu_ring *ring,
other = cring->fences[idx];
if (other) {
signed long r;
-   r = dma_fence_wait_timeout(other, false, MAX_SCHEDULE_TIMEOUT);
-   if (r < 0)
+   r = dma_fence_wait_timeout(other, true, MAX_SCHEDULE_TIMEOUT);
+   if (r < 0) {
DRM_ERROR("Error (%ld) waiting for fence!\n", r);
+   return -ERESTARTSYS;
+   }
}
 
dma_fence_get(fence);
@@ -271,8 +273,10 @@ uint64_t amdgpu_ctx_add_fence(struct amdgpu_ctx *ctx, 
struct amdgpu_ring *ring,
spin_unlock(&ctx->ring_lock);
 
dma_fence_put(other);
+   if (handler)
+   *handler = seq;
 
-   return seq;
+   return 0;
 }
 
 struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 02/18] drm/amdgpu:no kiq in IH

2017-09-17 Thread Monk Liu
Change-Id: I4deb65675d2531236b2f4e2bc6f015c657546464
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
index 67610f7..c291e33 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c
@@ -219,9 +219,9 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev)
wptr, adev->irq.ih.rptr, tmp);
adev->irq.ih.rptr = tmp;
 
-   tmp = RREG32(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL));
+   tmp = RREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL));
tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
-   WREG32(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp);
+   WREG32_NO_KIQ(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_RB_CNTL), tmp);
}
return (wptr & adev->irq.ih.ptr_mask);
 }
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/1] amdgpu: move asic id table to a separate file

2017-09-17 Thread Zhang, Jerry (Junwei)

looks fine to me, feel free to add my RB.
Reviewed-by: Junwei Zhang 

BTW, we also has 1 or 2 patch to improve the name parsing.
Please also take a look.

Jerry

On 05/11/2017 05:10 AM, Li, Samuel wrote:

Also attach a sample ids file for reference. The names are from marketing, not 
related to source code and no reviews necessary here:)  It can be put in 
directory /usr/share/libdrm.

Sam

-Original Message-
From: Li, Samuel
Sent: Wednesday, May 10, 2017 4:57 PM
To: amd-gfx@lists.freedesktop.org
Cc: Yuan, Xiaojie ; Li, Samuel 
Subject: [PATCH 1/1] amdgpu: move asic id table to a separate file

From: Xiaojie Yuan 

Change-Id: I12216da14910f5e2b0970bc1fafc2a20b0ef1ba9
Signed-off-by: Samuel Li 
---
  amdgpu/Makefile.am   |   2 +
  amdgpu/Makefile.sources  |   2 +-
  amdgpu/amdgpu_asic_id.c  | 198 +++
  amdgpu/amdgpu_asic_id.h  | 165 ---
  amdgpu/amdgpu_device.c   |  28 +--
  amdgpu/amdgpu_internal.h |  10 +++
  6 files changed, 232 insertions(+), 173 deletions(-)
  create mode 100644 amdgpu/amdgpu_asic_id.c
  delete mode 100644 amdgpu/amdgpu_asic_id.h

diff --git a/amdgpu/Makefile.am b/amdgpu/Makefile.am
index cf7bc1b..ecf9e82 100644
--- a/amdgpu/Makefile.am
+++ b/amdgpu/Makefile.am
@@ -30,6 +30,8 @@ AM_CFLAGS = \
$(PTHREADSTUBS_CFLAGS) \
-I$(top_srcdir)/include/drm

+AM_CPPFLAGS = -DAMDGPU_ASIC_ID_TABLE=\"${datadir}/libdrm/amdgpu.ids\"
+
  libdrm_amdgpu_la_LTLIBRARIES = libdrm_amdgpu.la
  libdrm_amdgpu_ladir = $(libdir)
  libdrm_amdgpu_la_LDFLAGS = -version-number 1:0:0 -no-undefined
diff --git a/amdgpu/Makefile.sources b/amdgpu/Makefile.sources
index 487b9e0..bc3abaa 100644
--- a/amdgpu/Makefile.sources
+++ b/amdgpu/Makefile.sources
@@ -1,5 +1,5 @@
  LIBDRM_AMDGPU_FILES := \
-   amdgpu_asic_id.h \
+   amdgpu_asic_id.c \
amdgpu_bo.c \
amdgpu_cs.c \
amdgpu_device.c \
diff --git a/amdgpu/amdgpu_asic_id.c b/amdgpu/amdgpu_asic_id.c
new file mode 100644
index 000..d50e21a
--- /dev/null
+++ b/amdgpu/amdgpu_asic_id.c
@@ -0,0 +1,198 @@
+/*
+ * Copyright © 2017 Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifdef HAVE_CONFIG_H
+#include "config.h"
+#endif
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "amdgpu_drm.h"
+#include "amdgpu_internal.h"
+
+static int parse_one_line(const char *line, struct amdgpu_asic_id *id)
+{
+   char *buf;
+   char *s_did;
+   char *s_rid;
+   char *s_name;
+   char *endptr;
+   int r = 0;
+
+   buf = strdup(line);
+   if (!buf)
+   return -ENOMEM;
+
+   /* ignore empty line and commented line */
+   if (strlen(line) == 0 || line[0] == '#') {
+   r = -EAGAIN;
+   goto out;
+   }
+
+   /* device id */
+   s_did = strtok(buf, ",");
+   if (!s_did) {
+   r = -EINVAL;
+   goto out;
+   }
+
+   id->did = strtol(s_did, &endptr, 16);
+   if (*endptr) {
+   r = -EINVAL;
+   goto out;
+   }
+
+   /* revision id */
+   s_rid = strtok(NULL, ",");
+   if (!s_rid) {
+   r = -EINVAL;
+   goto out;
+   }
+
+   id->rid = strtol(s_rid, &endptr, 16);
+   if (*endptr) {
+   r = -EINVAL;
+   goto out;
+   }
+
+   /* marketing name */
+   s_name = strtok(NULL, ",");
+   if (!s_name) {
+   r = -EINVAL;
+   goto out;
+   }
+
+   id->marketing_name = strdup(s_name);
+   if (id->marketing_name == NULL) {
+   r = -EINVAL;
+   goto out;
+   }
+
+out:
+   free(buf);
+
+   return r;
+}
+
+int amdgpu_parse_asic_ids(struct amdgpu_asic_id **p_asic_id_table)
+{
+   struct amdgpu_asic_id *asic_id_table;
+   struct amdgpu_asic_i

Re: [PATCH] drm/amdgpu/psp: declare raven psp firmware

2017-09-17 Thread Zhang, Jerry (Junwei)

On 09/16/2017 05:37 AM, Alex Deucher wrote:

So it gets picked up properly by the kernel.

Signed-off-by: Alex Deucher 

Reviewed-by: Junwei Zhang 


---
  drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
index 6ec5c9f..77cab1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
@@ -35,6 +35,8 @@
  #include "raven1/GC/gc_9_1_offset.h"
  #include "raven1/SDMA0/sdma0_4_1_offset.h"

+MODULE_FIRMWARE("amdgpu/raven_asd.bin");
+
  static int
  psp_v10_0_get_fw_type(struct amdgpu_firmware_info *ucode, enum 
psp_gfx_fw_type *type)
  {


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdkfd: check for null dev to avoid a null pointer dereference

2017-09-17 Thread Oded Gabbay
On Fri, Sep 8, 2017 at 5:13 PM, Colin King  wrote:
> From: Colin Ian King 
>
> The call to kfd_device_by_id can potentially return null, so check that
> dev is null and return with -EINVAL to avoid a null pointer dereference.
>
> Detected by CoverityScan CID#1454629 ("Dereference null return value")
>
> Fixes: 5d71dbc3a588 ("drm/amdkfd: Implement image tiling mode support v2")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index e4a8c2e52cb2..660b3fbade41 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -892,6 +892,8 @@ static int kfd_ioctl_get_tile_config(struct file *filep,
> int err = 0;
>
> dev = kfd_device_by_id(args->gpu_id);
> +   if (!dev)
> +   return -EINVAL;
>
> dev->kfd2kgd->get_tile_config(dev->kgd, &config);
>
> --
> 2.14.1
>
Thanks!
Applied to my -fixes tree
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 10/11] drm/amdkfd: Print event limit messages only once per process

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:43 AM, Felix Kuehling  wrote:
> To avoid spamming the log.
>
> Signed-off-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 -
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h   | 1 +
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index 5979158..944abfa 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -292,7 +292,10 @@ static int create_signal_event(struct file *devkfd,
> struct kfd_event *ev)
>  {
> if (p->signal_event_count == KFD_SIGNAL_EVENT_LIMIT) {
> -   pr_warn("Signal event wasn't created because limit was 
> reached\n");
> +   if (!p->signal_event_limit_reached) {
> +   pr_warn("Signal event wasn't created because limit 
> was reached\n");
> +   p->signal_event_limit_reached = true;
> +   }
> return -ENOMEM;
> }
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index bb71697..a546d01 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -532,6 +532,7 @@ struct kfd_process {
> struct list_head signal_event_pages;
> u32 next_nonsignal_event_id;
> size_t signal_event_count;
> +   bool signal_event_limit_reached;
>  };
>
>  /**
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
This patch is:
Reviewed-by: Oded Gabbay 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 09/11] drm/amdkfd: Fix kernel-queue wrapping bugs

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:43 AM, Felix Kuehling  wrote:
> From: Yong Zhao 
>
> Avoid intermediate negative numbers when doing calculations with a mix
> of signed and unsigned variables where implicit conversions can lead
> to unexpected results.
>
> When kernel queue buffer wraps around to 0, we need to check that rptr
> won't be overwritten by the new packet.
>
> Signed-off-by: Yong Zhao 
> Signed-off-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> index 9ebb4c1..1c66334 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> @@ -210,6 +210,11 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
> uint32_t wptr, rptr;
> unsigned int *queue_address;
>
> +   /* When rptr == wptr, the buffer is empty.
Start comment text in a new line. First line should be just /*

> +* When rptr == wptr + 1, the buffer is full.
> +* It is always rptr that advances to the position of wptr, rather 
> than
> +* the opposite. So we can only use up to queue_size_dwords - 1 
> dwords.
> +*/
> rptr = *kq->rptr_kernel;
> wptr = *kq->wptr_kernel;
> queue_address = (unsigned int *)kq->pq_kernel_addr;
> @@ -219,11 +224,10 @@ static int acquire_packet_buffer(struct kernel_queue 
> *kq,
> pr_debug("wptr: %d\n", wptr);
> pr_debug("queue_address 0x%p\n", queue_address);
>
> -   available_size = (rptr - 1 - wptr + queue_size_dwords) %
> +   available_size = (rptr + queue_size_dwords - 1 - wptr) %
> queue_size_dwords;
>
> -   if (packet_size_in_dwords >= queue_size_dwords ||
> -   packet_size_in_dwords >= available_size) {
> +   if (packet_size_in_dwords > available_size) {
> /*
>  * make sure calling functions know
>  * acquire_packet_buffer() failed
> @@ -233,6 +237,14 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
> }
>
> if (wptr + packet_size_in_dwords >= queue_size_dwords) {
> +   /* make sure after rolling back to position 0, there is
> +* still enough space.
> +*/
> +   if (packet_size_in_dwords >= rptr) {
> +   *buffer_ptr = NULL;
> +   return -ENOMEM;
> +   }

I don't think the condition is correct.
Suppose, queue_size_dwords == 100, wptr == rptr == 50 (queue is empty)
and we have a new packet with size of 70.
Now, wptr + size is 120, which is >= 100
However, 70 >= rptr (50) which will give us -ENOMEM, but this is not
correct condition, because the packet *does* have enough room in the
queue.

I think the condition should be:
if (packet_size_in_dwords - (queue_size_dwords - wptr) >= rptr)
but please check this.

> +   /* fill nops, roll back and start at position 0 */
> while (wptr > 0) {
> queue_address[wptr] = kq->nop_packet;
> wptr = (wptr + 1) % queue_size_dwords;
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 06/11] drm/amdkfd: Use VMID bitmap from KGD

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling  wrote:
> From: Yong Zhao 
>
> The hard-coded values related to VMID were removed in KFD, as those
> values can be calculated in the KFD initialization function.
>
> Signed-off-by: Yong Zhao 
> Signed-off-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c|  9 ++---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c|  7 +++
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 13 ++---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |  4 
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h  |  7 +++
>  drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |  2 +-
>  6 files changed, 23 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
> index 0aa021a..7d5635f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
> @@ -769,13 +769,8 @@ int dbgdev_wave_reset_wavefronts(struct kfd_dev *dev, 
> struct kfd_process *p)
> union GRBM_GFX_INDEX_BITS reg_gfx_index;
> struct kfd_process_device *pdd;
> struct dbg_wave_control_info wac_info;
> -   int temp;
> -   int first_vmid_to_scan = 8;
> -   int last_vmid_to_scan = 15;
> -
> -   first_vmid_to_scan = ffs(dev->shared_resources.compute_vmid_bitmap) - 
> 1;
> -   temp = dev->shared_resources.compute_vmid_bitmap >> 
> first_vmid_to_scan;
> -   last_vmid_to_scan = first_vmid_to_scan + ffz(temp);
> +   int first_vmid_to_scan = dev->vm_info.first_vmid_kfd;
> +   int last_vmid_to_scan = dev->vm_info.last_vmid_kfd;
>
> reg_sq_cmd.u32All = 0;
> status = 0;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index ff3f97c..abf91b0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -223,9 +223,16 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>  const struct kgd2kfd_shared_resources *gpu_resources)
>  {
> unsigned int size;
> +   unsigned int vmid_bitmap_kfd;
>
> kfd->shared_resources = *gpu_resources;
>
> +   vmid_bitmap_kfd = kfd->shared_resources.compute_vmid_bitmap;
Unnecessary copy, just use kfd->shared_resources.compute_vmid_bitmap
in the below lines. If you want a shorter name, use a pointer.

> +   kfd->vm_info.first_vmid_kfd = ffs(vmid_bitmap_kfd) - 1;
> +   kfd->vm_info.last_vmid_kfd = fls(vmid_bitmap_kfd) - 1;
> +   kfd->vm_info.vmid_num_kfd = kfd->vm_info.last_vmid_kfd
> +   - kfd->vm_info.first_vmid_kfd + 1;
> +
> /* calculate max size of mqds needed for queues */
> size = max_num_of_queues_per_device *
> kfd->device_info->mqd_size_aligned;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 5da7ef4..897ff083 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -113,11 +113,11 @@ static int allocate_vmid(struct device_queue_manager 
> *dqm,
> if (dqm->vmid_bitmap == 0)
> return -ENOMEM;
>
> -   bit = find_first_bit((unsigned long *)&dqm->vmid_bitmap, 
> CIK_VMID_NUM);
> +   bit = find_first_bit((unsigned long *)&dqm->vmid_bitmap,
> +   dqm->dev->vm_info.vmid_num_kfd);
> clear_bit(bit, (unsigned long *)&dqm->vmid_bitmap);
>
> -   /* Kaveri kfd vmid's starts from vmid 8 */
> -   allocated_vmid = bit + KFD_VMID_START_OFFSET;
> +   allocated_vmid = bit + dqm->dev->vm_info.first_vmid_kfd;
> pr_debug("vmid allocation %d\n", allocated_vmid);
> qpd->vmid = allocated_vmid;
> q->properties.vmid = allocated_vmid;
> @@ -132,7 +132,7 @@ static void deallocate_vmid(struct device_queue_manager 
> *dqm,
> struct qcm_process_device *qpd,
> struct queue *q)
>  {
> -   int bit = qpd->vmid - KFD_VMID_START_OFFSET;
> +   int bit = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd;
>
> /* Release the vmid mapping */
> set_pasid_vmid_mapping(dqm, 0, qpd->vmid);
> @@ -507,7 +507,7 @@ static int initialize_nocpsch(struct device_queue_manager 
> *dqm)
> dqm->allocated_queues[pipe] |= 1 << queue;
> }
>
> -   dqm->vmid_bitmap = (1 << VMID_PER_DEVICE) - 1;
> +   dqm->vmid_bitmap = (1 << dqm->dev->vm_info.vmid_num_kfd) - 1;
> dqm->sdma_bitmap = (1 << CIK_SDMA_QUEUES) - 1;
>
> return 0;
> @@ -613,8 +613,7 @@ static int set_sched_resources(struct 
> device_queue_manager *dqm)
> int i, mec;
> struct scheduling_resources res;
>
> -   res.vmid_mask = (1 << VMID_PER_DEVICE) - 1;
> -   res.vmid_mask <<= KFD_VM

Re: [PATCH 08/11] drm/amdkfd: Drop _nocpsch suffix from shared functions

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling  wrote:
> From: Yong Zhao 
>
> Several functions in DQM are shared between cpsch and nocpsch code.
> Remove the misleading _nocpsch suffix from their names.
>
> Signed-off-by: Yong Zhao 
> Signed-off-by: Felix Kuehling 
> ---
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 24 
> +++---
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 0ecea67..169e061 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -386,7 +386,7 @@ static int update_queue(struct device_queue_manager *dqm, 
> struct queue *q)
> return retval;
>  }
>
> -static struct mqd_manager *get_mqd_manager_nocpsch(
> +static struct mqd_manager *get_mqd_manager(
> struct device_queue_manager *dqm, enum KFD_MQD_TYPE type)
>  {
> struct mqd_manager *mqd;
> @@ -407,7 +407,7 @@ static struct mqd_manager *get_mqd_manager_nocpsch(
> return mqd;
>  }
>
> -static int register_process_nocpsch(struct device_queue_manager *dqm,
> +static int register_process(struct device_queue_manager *dqm,
> struct qcm_process_device *qpd)
>  {
> struct device_process_node *n;
> @@ -431,7 +431,7 @@ static int register_process_nocpsch(struct 
> device_queue_manager *dqm,
> return retval;
>  }
>
> -static int unregister_process_nocpsch(struct device_queue_manager *dqm,
> +static int unregister_process(struct device_queue_manager *dqm,
> struct qcm_process_device *qpd)
>  {
> int retval;
> @@ -513,7 +513,7 @@ static int initialize_nocpsch(struct device_queue_manager 
> *dqm)
> return 0;
>  }
>
> -static void uninitialize_nocpsch(struct device_queue_manager *dqm)
> +static void uninitialize(struct device_queue_manager *dqm)
>  {
> int i;
>
> @@ -1097,10 +1097,10 @@ struct device_queue_manager 
> *device_queue_manager_init(struct kfd_dev *dev)
> dqm->ops.stop = stop_cpsch;
> dqm->ops.destroy_queue = destroy_queue_cpsch;
> dqm->ops.update_queue = update_queue;
> -   dqm->ops.get_mqd_manager = get_mqd_manager_nocpsch;
> -   dqm->ops.register_process = register_process_nocpsch;
> -   dqm->ops.unregister_process = unregister_process_nocpsch;
> -   dqm->ops.uninitialize = uninitialize_nocpsch;
> +   dqm->ops.get_mqd_manager = get_mqd_manager;
> +   dqm->ops.register_process = register_process;
> +   dqm->ops.unregister_process = unregister_process;
> +   dqm->ops.uninitialize = uninitialize;
> dqm->ops.create_kernel_queue = create_kernel_queue_cpsch;
> dqm->ops.destroy_kernel_queue = destroy_kernel_queue_cpsch;
> dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
> @@ -1112,11 +1112,11 @@ struct device_queue_manager 
> *device_queue_manager_init(struct kfd_dev *dev)
> dqm->ops.create_queue = create_queue_nocpsch;
> dqm->ops.destroy_queue = destroy_queue_nocpsch;
> dqm->ops.update_queue = update_queue;
> -   dqm->ops.get_mqd_manager = get_mqd_manager_nocpsch;
> -   dqm->ops.register_process = register_process_nocpsch;
> -   dqm->ops.unregister_process = unregister_process_nocpsch;
> +   dqm->ops.get_mqd_manager = get_mqd_manager;
> +   dqm->ops.register_process = register_process;
> +   dqm->ops.unregister_process = unregister_process;
> dqm->ops.initialize = initialize_nocpsch;
> -   dqm->ops.uninitialize = uninitialize_nocpsch;
> +   dqm->ops.uninitialize = uninitialize;
> dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
> break;
> default:
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

This patch is:
Reviewed-by: Oded Gabbay 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 05/11] drm/amdkfd: Fix incorrect destroy_mqd parameter

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling  wrote:
> When uninitializing a kernel queue.
>
> Signed-off-by: Yong Zhao 
> Signed-off-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> index 0c82446..09356d0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> @@ -184,7 +184,7 @@ static void uninitialize(struct kernel_queue *kq)
> if (kq->queue->properties.type == KFD_QUEUE_TYPE_HIQ)
> kq->mqd->destroy_mqd(kq->mqd,
> NULL,
> -   false,
> +   KFD_PREEMPT_TYPE_WAVEFRONT_RESET,
> KFD_UNMAP_LATENCY_MS,
> kq->queue->pipe,
> kq->queue->queue);
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
This patch is:
Reviewed-by: Oded Gabbay 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 04/11] drm/amdkfd: Adjust dequeue latencies and timeouts

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling  wrote:
> Adjust latencies and timeouts for dequeueing with HWS and consolidate
> them in one place. Make them longer to allow long running waves to
> complete without causing a timeout. The timeout is twice as long as the
> latency plus some buffer to make sure we don't detect a timeout
> prematurely.
>
> Change timeouts for dequeueing HQDs without HWS. KFD_UNMAP_LATENCY is
> more consistent with what the HWS does for user queues.
>
> Signed-off-by: Yong Zhao 
> Signed-off-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 4 +++-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 ---
>  5 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 3db6a31..5da7ef4 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -323,7 +323,7 @@ static int destroy_queue_nocpsch(struct 
> device_queue_manager *dqm,
>
> retval = mqd->destroy_mqd(mqd, q->mqd,
> KFD_PREEMPT_TYPE_WAVEFRONT_RESET,
> -   QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS,
> +   KFD_UNMAP_LATENCY_MS,
> q->pipe, q->queue);
>
> if (retval)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> index faf820a..99e2305 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> @@ -29,7 +29,9 @@
>  #include "kfd_priv.h"
>  #include "kfd_mqd_manager.h"
>
> -#define QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS   (500)
> +#define KFD_UNMAP_LATENCY_MS   (4000)
> +#define QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS (2 * KFD_UNMAP_LATENCY_MS + 1000)
> +
>  #define CIK_VMID_NUM   (8)
>  #define KFD_VMID_START_OFFSET  (8)
>  #define VMID_PER_DEVICECIK_VMID_NUM
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> index 681b639..0c82446 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> @@ -185,7 +185,7 @@ static void uninitialize(struct kernel_queue *kq)
> kq->mqd->destroy_mqd(kq->mqd,
> NULL,
> false,
> -   QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS,
> +   KFD_UNMAP_LATENCY_MS,
> kq->queue->pipe,
> kq->queue->queue);
> else if (kq->queue->properties.type == KFD_QUEUE_TYPE_DIQ)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> index 1d31260..9eda884 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> @@ -376,7 +376,7 @@ int pm_send_set_resources(struct packet_manager *pm,
> packet->bitfields2.queue_type =
> 
> queue_type__mes_set_resources__hsa_interface_queue_hiq;
> packet->bitfields2.vmid_mask = res->vmid_mask;
> -   packet->bitfields2.unmap_latency = KFD_UNMAP_LATENCY;
> +   packet->bitfields2.unmap_latency = KFD_UNMAP_LATENCY_MS / 100;
> packet->bitfields7.oac_mask = res->oac_mask;
> packet->bitfields8.gds_heap_base = res->gds_heap_base;
> packet->bitfields8.gds_heap_size = res->gds_heap_size;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index f8d6a8e..099dc33 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -673,11 +673,8 @@ int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
>
>  /* Packet Manager */
>
> -#define KFD_HIQ_TIMEOUT (500)
> -
>  #define KFD_FENCE_COMPLETED (100)
>  #define KFD_FENCE_INIT   (10)
> -#define KFD_UNMAP_LATENCY (150)
>
>  struct packet_manager {
> struct device_queue_manager *dqm;
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

This patch is:
Reviewed-by: Oded Gabbay 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 02/11] drm/amdkfd: Fix suspend/resume issue on Carrizo

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling  wrote:
> From: Yong Zhao 
>
> When we do suspend/resume through "sudo pm-suspend" while there is
> HSA activity running, upon resume we will encounter HWS hanging, which
> is caused by memory read/write failures. The root cause is that when
> suspend, we neglected to unbind pasid from kfd device.
>
> Another major change is that the bind/unbinding is changed to be
> performed on a per process basis, instead of whether there are queues
> in dqm.
>
> Signed-off-by: Yong Zhao 
> Signed-off-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c| 22 --
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 13 
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h  | 15 +++-
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c   | 89 
> ++
>  4 files changed, 101 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index cc8af11..ff3f97c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -191,7 +191,7 @@ static void iommu_pasid_shutdown_callback(struct pci_dev 
> *pdev, int pasid)
> struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
>
> if (dev)
> -   kfd_unbind_process_from_device(dev, pasid);
> +   kfd_process_iommu_unbind_callback(dev, pasid);
>  }
>
>  /*
> @@ -339,12 +339,16 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
>
>  void kgd2kfd_suspend(struct kfd_dev *kfd)
>  {
> -   if (kfd->init_complete) {
> -   kfd->dqm->ops.stop(kfd->dqm);
> -   amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
> -   amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
> -   amd_iommu_free_device(kfd->pdev);
> -   }
> +   if (!kfd->init_complete)
> +   return;
> +
> +   kfd->dqm->ops.stop(kfd->dqm);
> +
> +   kfd_unbind_processes_from_device(kfd);
> +
> +   amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
> +   amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
> +   amd_iommu_free_device(kfd->pdev);
>  }
>
>  int kgd2kfd_resume(struct kfd_dev *kfd)
> @@ -369,6 +373,10 @@ static int kfd_resume(struct kfd_dev *kfd)
> amd_iommu_set_invalid_ppr_cb(kfd->pdev,
>  iommu_invalid_ppr_cb);
>
> +   err = kfd_bind_processes_to_device(kfd);
> +   if (err)
> +   return -ENXIO;

You need to undo previous initialization in case
kfd_bind_processes_to_device fails, i.e. call amd_iommu_free_device()

> +
> err = kfd->dqm->ops.start(kfd->dqm);
> if (err) {
> dev_err(kfd_device,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 53a66e8..5db82b8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -670,7 +670,6 @@ static int initialize_cpsch(struct device_queue_manager 
> *dqm)
>
>  static int start_cpsch(struct device_queue_manager *dqm)
>  {
> -   struct device_process_node *node;
> int retval;
>
> retval = 0;
> @@ -697,11 +696,6 @@ static int start_cpsch(struct device_queue_manager *dqm)
>
> init_interrupts(dqm);
>
> -   list_for_each_entry(node, &dqm->queues, list)
> -   if (node->qpd->pqm->process && dqm->dev)
> -   kfd_bind_process_to_device(dqm->dev,
> -   node->qpd->pqm->process);
> -
> execute_queues_cpsch(dqm, true);
>
> return 0;
> @@ -714,15 +708,8 @@ static int start_cpsch(struct device_queue_manager *dqm)
>
>  static int stop_cpsch(struct device_queue_manager *dqm)
>  {
> -   struct device_process_node *node;
> -   struct kfd_process_device *pdd;
> -
> destroy_queues_cpsch(dqm, true, true);
>
> -   list_for_each_entry(node, &dqm->queues, list) {
> -   pdd = qpd_to_pdd(node->qpd);
> -   pdd->bound = false;
> -   }
> kfd_gtt_sa_free(dqm->dev, dqm->fence_mem);
> pm_uninit(&dqm->packets);
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index b397ec7..ef582cc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -435,6 +435,13 @@ struct qcm_process_device {
> uint32_t sh_hidden_private_base;
>  };
>
> +
> +enum kfd_pdd_bound {
> +   PDD_UNBOUND = 0,
> +   PDD_BOUND,
> +   PDD_BOUND_SUSPENDED,
> +};
> +
>  /* Data that is per-process-per device. */
>  struct kfd_process_device {
> /*
> @@ -459,7 +466,7 @@ struct kfd_process_device {
> uint64_t scratch_limit;
>
> /* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) 
> */
> -   bool bound;
> +   enum kfd_pdd_bound boun

Re: [PATCH 01/11] drm/amdkfd: Reorganize kfd resume code

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling  wrote:
> From: Yong Zhao 
>
> The idea is to let kfd init and resume function share the same code path
> as much as possible, rather than to have two copies of almost identical
> code. That way improves the code readability and maintainability.
>
> Signed-off-by: Yong Zhao 
> Signed-off-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 78 
> +
>  1 file changed, 40 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 61fff25..cc8af11 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -92,6 +92,8 @@ static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned 
> int buf_size,
> unsigned int chunk_size);
>  static void kfd_gtt_sa_fini(struct kfd_dev *kfd);
>
> +static int kfd_resume(struct kfd_dev *kfd);
> +
>  static const struct kfd_device_info *lookup_device_info(unsigned short did)
>  {
> size_t i;
> @@ -176,15 +178,8 @@ static bool device_iommu_pasid_init(struct kfd_dev *kfd)
> pasid_limit,
> kfd->doorbell_process_limit - 1);
>
> -   err = amd_iommu_init_device(kfd->pdev, pasid_limit);
> -   if (err < 0) {
> -   dev_err(kfd_device, "error initializing iommu device\n");
> -   return false;
> -   }
> -
> if (!kfd_set_pasid_limit(pasid_limit)) {
> dev_err(kfd_device, "error setting pasid limit\n");
> -   amd_iommu_free_device(kfd->pdev);
> return false;
> }
>
> @@ -280,29 +275,22 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
> goto kfd_interrupt_error;
> }
>
> -   if (!device_iommu_pasid_init(kfd)) {
> -   dev_err(kfd_device,
> -   "Error initializing iommuv2 for device %x:%x\n",
> -   kfd->pdev->vendor, kfd->pdev->device);
> -   goto device_iommu_pasid_error;
> -   }
> -   amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
> -   
> iommu_pasid_shutdown_callback);
> -   amd_iommu_set_invalid_ppr_cb(kfd->pdev, iommu_invalid_ppr_cb);
> -
> kfd->dqm = device_queue_manager_init(kfd);
> if (!kfd->dqm) {
> dev_err(kfd_device, "Error initializing queue manager\n");
> goto device_queue_manager_error;
> }
>
> -   if (kfd->dqm->ops.start(kfd->dqm)) {
> +   if (!device_iommu_pasid_init(kfd)) {
> dev_err(kfd_device,
> -   "Error starting queue manager for device %x:%x\n",
> +   "Error initializing iommuv2 for device %x:%x\n",
> kfd->pdev->vendor, kfd->pdev->device);
> -   goto dqm_start_error;
> +   goto device_iommu_pasid_error;
> }
>
> +   if (kfd_resume(kfd))
> +   goto kfd_resume_error;
> +
> kfd->dbgmgr = NULL;
>
> kfd->init_complete = true;
> @@ -314,11 +302,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>
> goto out;
>
> -dqm_start_error:
> +kfd_resume_error:
> +device_iommu_pasid_error:
> device_queue_manager_uninit(kfd->dqm);
>  device_queue_manager_error:
> -   amd_iommu_free_device(kfd->pdev);
> -device_iommu_pasid_error:
> kfd_interrupt_exit(kfd);
>  kfd_interrupt_error:
> kfd_topology_remove_device(kfd);
> @@ -338,8 +325,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>  void kgd2kfd_device_exit(struct kfd_dev *kfd)
>  {
> if (kfd->init_complete) {
> +   kgd2kfd_suspend(kfd);
> device_queue_manager_uninit(kfd->dqm);
> -   amd_iommu_free_device(kfd->pdev);
> kfd_interrupt_exit(kfd);
> kfd_topology_remove_device(kfd);
> kfd_doorbell_fini(kfd);
> @@ -362,25 +349,40 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
>
>  int kgd2kfd_resume(struct kfd_dev *kfd)
>  {
> -   unsigned int pasid_limit;
> -   int err;
> +   if (!kfd->init_complete)
> +   return 0;
>
> -   pasid_limit = kfd_get_pasid_limit();
> +   return kfd_resume(kfd);
>
> -   if (kfd->init_complete) {
> -   err = amd_iommu_init_device(kfd->pdev, pasid_limit);
> -   if (err < 0) {
> -   dev_err(kfd_device, "failed to initialize iommu\n");
> -   return -ENXIO;
> -   }
> +}
>
> -   amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
> -   
> iommu_pasid_shutdown_callback);
> -   amd_iommu_set_invalid_ppr_cb(kfd->pdev, iommu_invalid_ppr_cb);
> -   kfd->dqm->ops.start(kfd->dqm);
> +static int kfd_resume(struct kfd_dev *kfd)
> +{
> +   int err = 0;
> +   

Re: [PATCH 07/11] drm/amdkfd: Reuse CHIP_* from amdgpu

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling  wrote:
> From: Yong Zhao 
>
> There are already CHIP_* definitions under amd_shared.h file on amdgpu
> side, so KFD should reuse them rather than defining new ones.
>
> Using enum for asic type requires default cases on switch statements
> to prevent compiler warnings. BUG on unsupported ASICs. It should never
> get there because KFD should not be initialized on unsupported devices.

We did an effort to remove all BUG statements from the driver so
please don't introduce new ones.
Even if the code should never reach there, it is not a reason to crash
the entire kernel as it doesn't effect the rest of the system's
functionality.
Oded.

>
> Signed-off-by: Yong Zhao 
> Signed-off-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c  | 2 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 9 +++--
>  4 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 897ff083..0ecea67 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1132,6 +1132,8 @@ struct device_queue_manager 
> *device_queue_manager_init(struct kfd_dev *dev)
> case CHIP_KAVERI:
> device_queue_manager_init_cik(&dqm->ops_asic_specific);
> break;
> +   default:
> +   BUG();
Replace this with some error printing.

> }
>
> if (!dqm->ops.initialize(dqm))
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> index 09356d0..9ebb4c1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> @@ -291,6 +291,8 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev 
> *dev,
> case CHIP_KAVERI:
> kernel_queue_init_cik(&kq->ops_asic_specific);
> break;
> +   default:
> +   BUG();
Replace this with some error printing.

> }
>
> if (!kq->ops.initialize(kq, dev, type, KFD_KERNEL_QUEUE_SIZE)) {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
> index b1ef136..b5a87ba 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
> @@ -31,6 +31,8 @@ struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type,
> return mqd_manager_init_cik(type, dev);
> case CHIP_CARRIZO:
> return mqd_manager_init_vi(type, dev);
> +   default:
> +   BUG();
Replace this with some error printing.

> }
>
> return NULL;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 7bed4ef..bb71697 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -33,6 +33,8 @@
>  #include 
>  #include 
>
> +#include "amd_shared.h"
> +
>  #define KFD_SYSFS_FILE_MODE 0444
>
>  #define KFD_MMAP_DOORBELL_MASK 0x8
> @@ -112,11 +114,6 @@ enum cache_policy {
> cache_policy_noncoherent
>  };
>
> -enum asic_family_type {
> -   CHIP_KAVERI = 0,
> -   CHIP_CARRIZO
> -};
> -
>  struct kfd_event_interrupt_class {
> bool (*interrupt_isr)(struct kfd_dev *dev,
> const uint32_t *ih_ring_entry);
> @@ -125,7 +122,7 @@ struct kfd_event_interrupt_class {
>  };
>
>  struct kfd_device_info {
> -   unsigned int asic_family;
> +   enum amd_asic_type asic_family;
> const struct kfd_event_interrupt_class *event_interrupt_class;
> unsigned int max_pasid_bits;
> unsigned int max_no_of_hqd;
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 11/11] drm/amdkfd: Set /dev/kfd permissions to 0666 by default

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:43 AM, Felix Kuehling  wrote:
> From: Andres Rodriguez 
>
> Set the default permissions of /dev/kfd to be more than just root
> accessible 600.
>
I don't think that's acceptable.
You need to use udev rules file for that.

Oded

> Signed-off-by: Andres Rodriguez 
> Reviewed-by: Felix Kuehling 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index e4a8c2e..1ad9901 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -55,6 +55,14 @@ static int kfd_char_dev_major = -1;
>  static struct class *kfd_class;
>  struct device *kfd_device;
>
> +static char *kfd_devnode(struct device *dev, umode_t *mode)
> +{
> +   if (mode && dev->devt == MKDEV(kfd_char_dev_major, 0))
> +   *mode = 0666;
> +
> +   return NULL;
> +}
> +
>  int kfd_chardev_init(void)
>  {
> int err = 0;
> @@ -69,6 +77,8 @@ int kfd_chardev_init(void)
> if (IS_ERR(kfd_class))
> goto err_class_create;
>
> +   kfd_class->devnode = kfd_devnode;
> +
> kfd_device = device_create(kfd_class, NULL,
> MKDEV(kfd_char_dev_major, 0),
> NULL, kfd_dev_name);
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx