date:20191120

RE: [PATCH 2/2] drm/amdgpu: invalidate mmhub semphore workaround in gmc9/gmc10

2019-11-20 Thread Zhou1, Tao

Yes, we could delay the submit of patch 2, or only apply the patch to MMHUB 
currently and apply it to GFXHUB if we fix the gfx hang issue in the future.

For patch 1, as Monk confirms SRIOV won't enable PG at all, I agree that patch 
1 could be dropped.

Tao

> -Original Message-
> From: Christian König 
> Sent: 2019年11月20日 19:27
> To: Zhu, Changfeng ; Koenig, Christian
> ; Xiao, Jack ; Zhou1, Tao
> ; Huang, Ray ; Huang,
> Shimmer ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 2/2] drm/amdgpu: invalidate mmhub semphore
> workaround in gmc9/gmc10
> 
> Am 20.11.19 um 10:44 schrieb Changfeng.Zhu:
> > From: changzhu 
> >
> > It may lose gpuvm invalidate acknowldege state across power-gating off
> > cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore
> > acquire before invalidation and semaphore release after invalidation.
> >
> > After adding semaphore acquire before invalidation, the semaphore
> > register become read-only if another process try to acquire semaphore.
> > Then it will not be able to release this semaphore. Then it may cause
> > deadlock problem. If this deadlock problem happens, it needs a
> > semaphore firmware fix.
> 
> Please remove the DRM_WARN_ONCE, that looks like overkill to me.
> 
> And I'm not sure how urgent that issue here is. We could also wait a few
> more days and see if the hw guys figure out why this lockups on the GFX ring.
> 
> Regards,
> Christian.
> 
> >
> > Change-Id: I9942a2f451265c1f1038ccfe2f70042c7c8118af
> > Signed-off-by: changzhu 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 49
> ++
> >   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 49
> ++
> >   drivers/gpu/drm/amd/amdgpu/soc15.h |  4 +--
> >   3 files changed, 100 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > index af2615ba52aa..685d0d5ef31e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> > @@ -234,6 +234,24 @@ static void gmc_v10_0_flush_vm_hub(struct
> amdgpu_device *adev, uint32_t vmid,
> > const unsigned eng = 17;
> > unsigned int i;
> >
> > +   spin_lock(>gmc.invalidate_lock);
> > +   /*
> > +* It may lose gpuvm invalidate acknowldege state across power-
> gating
> > +* off cycle, add semaphore acquire before invalidation and
> semaphore
> > +* release after invalidation to avoid entering power gated state
> > +* to WA the Issue
> > +*/
> > +   for (i = 0; i < adev->usec_timeout; i++) {
> > +   /* a read return value of 1 means semaphore acuqire */
> > +   tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_sem + eng);
> > +   if (tmp & 0x1)
> > +   break;
> > +   udelay(1);
> > +   }
> > +
> > +   if (i >= adev->usec_timeout)
> > +   DRM_ERROR("Timeout waiting for sem acquire in VM
> flush!\n");
> > +
> > WREG32_NO_KIQ(hub->vm_inv_eng0_req + eng, tmp);
> >
> > /*
> > @@ -253,6 +271,14 @@ static void gmc_v10_0_flush_vm_hub(struct
> amdgpu_device *adev, uint32_t vmid,
> > udelay(1);
> > }
> >
> > +   /*
> > +* add semaphore release after invalidation,
> > +* write with 0 means semaphore release
> > +*/
> > +   WREG32_NO_KIQ(hub->vm_inv_eng0_sem + eng, 0);
> > +
> > +   spin_unlock(>gmc.invalidate_lock);
> > +
> > if (i < adev->usec_timeout)
> > return;
> >
> > @@ -338,6 +364,21 @@ static uint64_t
> gmc_v10_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring,
> > uint32_t req = gmc_v10_0_get_invalidate_req(vmid, 0);
> > unsigned eng = ring->vm_inv_eng;
> >
> > +   /*
> > +* It may lose gpuvm invalidate acknowldege state across power-
> gating
> > +* off cycle, add semaphore acquire before invalidation and
> semaphore
> > +* release after invalidation to avoid entering power gated state
> > +* to WA the Issue
> > +*/
> > +
> > +   /* a read return value of 1 means semaphore acuqire */
> > +   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> > +   ring->funcs->vmhub == AMDGPU_MMHUB_1) {
> > +   amdgpu_ring_emit_reg_wait(ring,
> > + hub->vm_inv_eng0_sem + eng, 0x1,
> 0x1);
> > +   DRM_WARN_ONCE("Adding semaphore may cause deadlock
> and it needs firmware fix\n");
> > +   }
> > +
> > amdgpu_ring_emit_wreg(ring, hub->ctx0_ptb_addr_lo32 + (2 * vmid),
> >   lower_32_bits(pd_addr));
> >
> > @@ -348,6 +389,14 @@ static uint64_t
> gmc_v10_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring,
> > hub->vm_inv_eng0_ack + eng,
> > req, 1 << vmid);
> >
> > +   /*
> > +* add semaphore release after invalidation,
> > +* write with 0 means semaphore release
> > +*/
> > +   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> > +   ring->funcs->vmhub == AMDGPU_MMHUB_1)
> > +

RE: [PATCH 1/2] drm/amd/amdgpu/sriov temporarily skip ras, dtm, hdcp for arcturus VF

2019-11-20 Thread Zhang, Jack (Jian)

Hi, Team,

Would you please help to take a look this patch?

BR
Jack

-Original Message-
From: amd-gfx  On Behalf Of Jack Zhang
Sent: Thursday, November 21, 2019 2:17 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Jack (Jian) 
Subject: [PATCH 1/2] drm/amd/amdgpu/sriov temporarily skip ras, dtm, hdcp for 
arcturus VF

Temporarily skip ras,dtm,hdcp initialize and terminate for arcturus VF 
Currently the three features haven't been enabled at SRIOV, it would trigger 
guest driver load fail with the bare-metal path of the three features.

Signed-off-by: Jack Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 36 +
 1 file changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 2a8a08a..c3a42d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -756,6 +756,12 @@ int psp_ras_enable_features(struct psp_context *psp,
 
 static int psp_ras_terminate(struct psp_context *psp)  {
+   /*
+* TODO: bypass the terminate in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->ras.ras_initialized)
@@ -777,6 +783,12 @@ static int psp_ras_terminate(struct psp_context *psp)
 
 static int psp_ras_initialize(struct psp_context *psp)  {
+   /*
+* TODO: bypass the initialize in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->adev->psp.ta_ras_ucode_size || @@ -872,6 +884,12 @@ static 
int psp_hdcp_load(struct psp_context *psp)  }  static int 
psp_hdcp_initialize(struct psp_context *psp)  {
+   /*
+* TODO: bypass the initialize in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->adev->psp.ta_hdcp_ucode_size || @@ -960,6 +978,12 @@ int 
psp_hdcp_invoke(struct psp_context *psp, uint32_t ta_cmd_id)
 
 static int psp_hdcp_terminate(struct psp_context *psp)  {
+   /*
+* TODO: bypass the terminate in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->hdcp_context.hdcp_initialized)
@@ -1051,6 +1075,12 @@ static int psp_dtm_load(struct psp_context *psp)
 
 static int psp_dtm_initialize(struct psp_context *psp)  {
+   /*
+* TODO: bypass the initialize in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->adev->psp.ta_dtm_ucode_size || @@ -1109,6 +1139,12 @@ int 
psp_dtm_invoke(struct psp_context *psp, uint32_t ta_cmd_id)
 
 static int psp_dtm_terminate(struct psp_context *psp)  {
+   /*
+* TODO: bypass the terminate in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->dtm_context.dtm_initialized)
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/amd/amdgpu/sriov skip RLCG s/r list for arcturus VF.

2019-11-20 Thread Jack Zhang

After rlcg fw 2.1, kmd driver starts to load extra fw for
LIST_CNTL,GPM_MEM,SRM_MEM. We needs to skip the three fw
because all rlcg related fw have been loaded by host driver.
Guest driver would load the three fw fail without this change.

Signed-off-by: Jack Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index c3a42d3..eecde80 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1470,7 +1470,10 @@ static int psp_np_fw_load(struct psp_context *psp)
|| ucode->ucode_id == AMDGPU_UCODE_ID_SDMA5
|| ucode->ucode_id == AMDGPU_UCODE_ID_SDMA6
|| ucode->ucode_id == AMDGPU_UCODE_ID_SDMA7
-   || ucode->ucode_id == AMDGPU_UCODE_ID_RLC_G))
+|| ucode->ucode_id == AMDGPU_UCODE_ID_RLC_G
+   || ucode->ucode_id == AMDGPU_UCODE_ID_RLC_RESTORE_LIST_CNTL
+   || ucode->ucode_id == 
AMDGPU_UCODE_ID_RLC_RESTORE_LIST_GPM_MEM
+   || ucode->ucode_id == 
AMDGPU_UCODE_ID_RLC_RESTORE_LIST_SRM_MEM))
/*skip ucode loading in SRIOV VF */
continue;
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amd/amdgpu/sriov temporarily skip ras, dtm, hdcp for arcturus VF

2019-11-20 Thread Jack Zhang

Temporarily skip ras,dtm,hdcp initialize and terminate for arcturus VF
Currently the three features haven't been enabled at SRIOV, it would
trigger guest driver load fail with the bare-metal path of the three
features.

Signed-off-by: Jack Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 36 +
 1 file changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 2a8a08a..c3a42d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -756,6 +756,12 @@ int psp_ras_enable_features(struct psp_context *psp,
 
 static int psp_ras_terminate(struct psp_context *psp)
 {
+   /*
+* TODO: bypass the terminate in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->ras.ras_initialized)
@@ -777,6 +783,12 @@ static int psp_ras_terminate(struct psp_context *psp)
 
 static int psp_ras_initialize(struct psp_context *psp)
 {
+   /*
+* TODO: bypass the initialize in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->adev->psp.ta_ras_ucode_size ||
@@ -872,6 +884,12 @@ static int psp_hdcp_load(struct psp_context *psp)
 }
 static int psp_hdcp_initialize(struct psp_context *psp)
 {
+   /*
+* TODO: bypass the initialize in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->adev->psp.ta_hdcp_ucode_size ||
@@ -960,6 +978,12 @@ int psp_hdcp_invoke(struct psp_context *psp, uint32_t 
ta_cmd_id)
 
 static int psp_hdcp_terminate(struct psp_context *psp)
 {
+   /*
+* TODO: bypass the terminate in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->hdcp_context.hdcp_initialized)
@@ -1051,6 +1075,12 @@ static int psp_dtm_load(struct psp_context *psp)
 
 static int psp_dtm_initialize(struct psp_context *psp)
 {
+   /*
+* TODO: bypass the initialize in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->adev->psp.ta_dtm_ucode_size ||
@@ -1109,6 +1139,12 @@ int psp_dtm_invoke(struct psp_context *psp, uint32_t 
ta_cmd_id)
 
 static int psp_dtm_terminate(struct psp_context *psp)
 {
+   /*
+* TODO: bypass the terminate in sriov for now
+*/
+   if (amdgpu_sriov_vf(psp->adev))
+   return 0;
+
int ret;
 
if (!psp->dtm_context.dtm_initialized)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: implement TMZ accessor (v2)

2019-11-20 Thread Luben Tuikov

On 2019-11-20 10:21 p.m., Luben Tuikov wrote:
> On 2019-11-20 10:02 p.m., Liu, Aaron wrote:
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of
>>> Luben Tuikov
>>> Sent: Thursday, November 21, 2019 9:33 AM
>>> To: amd-gfx@lists.freedesktop.org
>>> Cc: Deucher, Alexander ; Tuikov, Luben
>>> ; Koenig, Christian 
>>> Subject: [PATCH] drm/amdgpu: implement TMZ accessor (v2)
>>>
>>> Implement an accessor of adev->tmz.enabled. Let not code around access it
>>> as "if (adev->tmz.enabled)"
>>> as the organization may change. Instead...
>>>
>>> Recruit "bool amdgpu_is_tmz(adev)" to return exactly this Boolean value.
>>> That is, this function is now an accessor of an already initialized and set 
>>> adev
>>> and adev->tmz.
>>>
>>> Add "void amdgpu_tmz_set(adev)" to check and set
>>> adev->tmz.* at initialization time. After which
>>> one uses "bool amdgpu_is_tmz(adev)" to query whether adev supports TMZ.
>>>
>>> Also, remove circular header file include.
>>>
>>> v2: Remove amdgpu_tmz.[ch] as requested.
>>>
>>> Signed-off-by: Luben Tuikov 
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/Makefile|  2 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 ++-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +-
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c| 23 ++
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h|  9 
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.c| 52 --
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.h| 39 
>>>  7 files changed, 39 insertions(+), 95 deletions(-)  delete mode 100644
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.c
>>>  delete mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.h
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
>>> b/drivers/gpu/drm/amd/amdgpu/Makefile
>>> index 83ee1c676e3a..7ae3b22c5628 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>>> @@ -55,7 +55,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>>> amdgpu_vf_error.o amdgpu_sched.o amdgpu_debugfs.o
>>> amdgpu_ids.o \
>>> amdgpu_gmc.o amdgpu_mmhub.o amdgpu_xgmi.o amdgpu_csa.o
>>> amdgpu_ras.o amdgpu_vm_cpu.o \
>>> amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o
>>> amdgpu_nbio.o \
>>> -   amdgpu_umc.o smu_v11_0_i2c.o amdgpu_tmz.o
>>> +   amdgpu_umc.o smu_v11_0_i2c.o
>>>
>>>  amdgpu-$(CONFIG_PERF_EVENTS) += amdgpu_pmu.o
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index d120fe58ebea..805e12ef13ea 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -90,7 +90,6 @@
>>>  #include "amdgpu_mes.h"
>>>  #include "amdgpu_umc.h"
>>>  #include "amdgpu_mmhub.h"
>>> -#include "amdgpu_tmz.h"
>>>
>>>  #define MAX_GPU_INSTANCE   16
>>>
>>> @@ -1266,5 +1265,10 @@ _name##_show(struct device *dev,
>>> \
>>> \
>>>  static struct device_attribute pmu_attr_##_name = __ATTR_RO(_name)
>>>
>>> +static inline bool amdgpu_is_tmz(struct amdgpu_device *adev) {
>>> +   return adev->tmz.enabled;
>>> +}
>>> +
>>>  #endif
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index b1408c5e4640..56836054e6a8 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -64,7 +64,6 @@
>>>  #include "amdgpu_xgmi.h"
>>>  #include "amdgpu_ras.h"
>>>  #include "amdgpu_pmu.h"
>>> -#include "amdgpu_tmz.h"
>>>
>>>  #include 
>>>
>>> @@ -1073,7 +1072,7 @@ static int amdgpu_device_check_arguments(struct
>>> amdgpu_device *adev)
>>>
>>> adev->firmware.load_type = amdgpu_ucode_get_load_type(adev,
>>> amdgpu_fw_load_type);
>>>
>>> -   adev->tmz.enabled = amdgpu_is_tmz(adev);
>>> +   amdgpu_tmz_set(adev);
>>>
>>> return ret;
>>>  }
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
>>> index a12f33c0f5df..a0245d8b2f37 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
>>> @@ -333,3 +333,26 @@ void amdgpu_gmc_ras_fini(struct amdgpu_device
>>> *adev)
>>> amdgpu_mmhub_ras_fini(adev);
>>> amdgpu_xgmi_ras_fini(adev);
>>>  }
>>> +
>>> +/**
>>> + * amdgpu_tmz_set -- check and set if a device supports TMZ
>>> + * @adev: amdgpu_device pointer
>>> + *
>>> + * Check and set if an the device @adev supports Trusted Memory
>>> + * Zones (TMZ).
>>> + */
>>> +void amdgpu_tmz_set(struct amdgpu_device *adev) {
>>> +   if (!amdgpu_tmz)
>>> +   return;
>>> +
>>> +   if (adev->asic_type < CHIP_RAVEN ||
>>> +   adev->asic_type == CHIP_ARCTURUS) {
>>> +   dev_warn(adev->dev, "Trusted Memory Zone (TMZ) feature
>>> not supported\n");
>>> +   return;
>>> +   }
>>> +
>>> +   adev->tmz.enabled = true;
>>> +
>>> +   dev_info(adev->dev, "Trusted Memory Zone (TMZ) feature
>>>

Re: [PATCH] drm/amdgpu: implement TMZ accessor (v2)

2019-11-20 Thread Luben Tuikov

On 2019-11-20 10:02 p.m., Liu, Aaron wrote:
>> -Original Message-
>> From: amd-gfx  On Behalf Of
>> Luben Tuikov
>> Sent: Thursday, November 21, 2019 9:33 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander ; Tuikov, Luben
>> ; Koenig, Christian 
>> Subject: [PATCH] drm/amdgpu: implement TMZ accessor (v2)
>>
>> Implement an accessor of adev->tmz.enabled. Let not code around access it
>> as "if (adev->tmz.enabled)"
>> as the organization may change. Instead...
>>
>> Recruit "bool amdgpu_is_tmz(adev)" to return exactly this Boolean value.
>> That is, this function is now an accessor of an already initialized and set 
>> adev
>> and adev->tmz.
>>
>> Add "void amdgpu_tmz_set(adev)" to check and set
>> adev->tmz.* at initialization time. After which
>> one uses "bool amdgpu_is_tmz(adev)" to query whether adev supports TMZ.
>>
>> Also, remove circular header file include.
>>
>> v2: Remove amdgpu_tmz.[ch] as requested.
>>
>> Signed-off-by: Luben Tuikov 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/Makefile|  2 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 ++-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c| 23 ++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h|  9 
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.c| 52 --
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.h| 39 
>>  7 files changed, 39 insertions(+), 95 deletions(-)  delete mode 100644
>> drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.c
>>  delete mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.h
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
>> b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index 83ee1c676e3a..7ae3b22c5628 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -55,7 +55,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>>  amdgpu_vf_error.o amdgpu_sched.o amdgpu_debugfs.o
>> amdgpu_ids.o \
>>  amdgpu_gmc.o amdgpu_mmhub.o amdgpu_xgmi.o amdgpu_csa.o
>> amdgpu_ras.o amdgpu_vm_cpu.o \
>>  amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o
>> amdgpu_nbio.o \
>> -amdgpu_umc.o smu_v11_0_i2c.o amdgpu_tmz.o
>> +amdgpu_umc.o smu_v11_0_i2c.o
>>
>>  amdgpu-$(CONFIG_PERF_EVENTS) += amdgpu_pmu.o
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index d120fe58ebea..805e12ef13ea 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -90,7 +90,6 @@
>>  #include "amdgpu_mes.h"
>>  #include "amdgpu_umc.h"
>>  #include "amdgpu_mmhub.h"
>> -#include "amdgpu_tmz.h"
>>
>>  #define MAX_GPU_INSTANCE16
>>
>> @@ -1266,5 +1265,10 @@ _name##_show(struct device *dev,
>>  \
>>  \
>>  static struct device_attribute pmu_attr_##_name = __ATTR_RO(_name)
>>
>> +static inline bool amdgpu_is_tmz(struct amdgpu_device *adev) {
>> +   return adev->tmz.enabled;
>> +}
>> +
>>  #endif
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index b1408c5e4640..56836054e6a8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -64,7 +64,6 @@
>>  #include "amdgpu_xgmi.h"
>>  #include "amdgpu_ras.h"
>>  #include "amdgpu_pmu.h"
>> -#include "amdgpu_tmz.h"
>>
>>  #include 
>>
>> @@ -1073,7 +1072,7 @@ static int amdgpu_device_check_arguments(struct
>> amdgpu_device *adev)
>>
>>  adev->firmware.load_type = amdgpu_ucode_get_load_type(adev,
>> amdgpu_fw_load_type);
>>
>> -adev->tmz.enabled = amdgpu_is_tmz(adev);
>> +amdgpu_tmz_set(adev);
>>
>>  return ret;
>>  }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
>> index a12f33c0f5df..a0245d8b2f37 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
>> @@ -333,3 +333,26 @@ void amdgpu_gmc_ras_fini(struct amdgpu_device
>> *adev)
>>  amdgpu_mmhub_ras_fini(adev);
>>  amdgpu_xgmi_ras_fini(adev);
>>  }
>> +
>> +/**
>> + * amdgpu_tmz_set -- check and set if a device supports TMZ
>> + * @adev: amdgpu_device pointer
>> + *
>> + * Check and set if an the device @adev supports Trusted Memory
>> + * Zones (TMZ).
>> + */
>> +void amdgpu_tmz_set(struct amdgpu_device *adev) {
>> +if (!amdgpu_tmz)
>> +return;
>> +
>> +if (adev->asic_type < CHIP_RAVEN ||
>> +adev->asic_type == CHIP_ARCTURUS) {
>> +dev_warn(adev->dev, "Trusted Memory Zone (TMZ) feature
>> not supported\n");
>> +return;
>> +}
>> +
>> +adev->tmz.enabled = true;
>> +
>> +dev_info(adev->dev, "Trusted Memory Zone (TMZ) feature
>> supported and
>> +enabled\n"); }
> 
> Hi Luben,
> TMZ is just a specific feature and I think this is a nice change that moving 
> amdgpu_tmz to amdgpu_gmc.h.
> Another

RE: [PATCH] drm/amdgpu: implement TMZ accessor (v2)

2019-11-20 Thread Liu, Aaron

> -Original Message-
> From: amd-gfx  On Behalf Of
> Luben Tuikov
> Sent: Thursday, November 21, 2019 9:33 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Tuikov, Luben
> ; Koenig, Christian 
> Subject: [PATCH] drm/amdgpu: implement TMZ accessor (v2)
> 
> Implement an accessor of adev->tmz.enabled. Let not code around access it
> as "if (adev->tmz.enabled)"
> as the organization may change. Instead...
> 
> Recruit "bool amdgpu_is_tmz(adev)" to return exactly this Boolean value.
> That is, this function is now an accessor of an already initialized and set 
> adev
> and adev->tmz.
> 
> Add "void amdgpu_tmz_set(adev)" to check and set
> adev->tmz.* at initialization time. After which
> one uses "bool amdgpu_is_tmz(adev)" to query whether adev supports TMZ.
> 
> Also, remove circular header file include.
> 
> v2: Remove amdgpu_tmz.[ch] as requested.
> 
> Signed-off-by: Luben Tuikov 
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile|  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c| 23 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h|  9 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.c| 52 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.h| 39 
>  7 files changed, 39 insertions(+), 95 deletions(-)  delete mode 100644
> drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.c
>  delete mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.h
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
> b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 83ee1c676e3a..7ae3b22c5628 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -55,7 +55,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>   amdgpu_vf_error.o amdgpu_sched.o amdgpu_debugfs.o
> amdgpu_ids.o \
>   amdgpu_gmc.o amdgpu_mmhub.o amdgpu_xgmi.o amdgpu_csa.o
> amdgpu_ras.o amdgpu_vm_cpu.o \
>   amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o
> amdgpu_nbio.o \
> - amdgpu_umc.o smu_v11_0_i2c.o amdgpu_tmz.o
> + amdgpu_umc.o smu_v11_0_i2c.o
> 
>  amdgpu-$(CONFIG_PERF_EVENTS) += amdgpu_pmu.o
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index d120fe58ebea..805e12ef13ea 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -90,7 +90,6 @@
>  #include "amdgpu_mes.h"
>  #include "amdgpu_umc.h"
>  #include "amdgpu_mmhub.h"
> -#include "amdgpu_tmz.h"
> 
>  #define MAX_GPU_INSTANCE 16
> 
> @@ -1266,5 +1265,10 @@ _name##_show(struct device *dev,
>   \
>   \
>  static struct device_attribute pmu_attr_##_name = __ATTR_RO(_name)
> 
> +static inline bool amdgpu_is_tmz(struct amdgpu_device *adev) {
> +   return adev->tmz.enabled;
> +}
> +
>  #endif
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index b1408c5e4640..56836054e6a8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -64,7 +64,6 @@
>  #include "amdgpu_xgmi.h"
>  #include "amdgpu_ras.h"
>  #include "amdgpu_pmu.h"
> -#include "amdgpu_tmz.h"
> 
>  #include 
> 
> @@ -1073,7 +1072,7 @@ static int amdgpu_device_check_arguments(struct
> amdgpu_device *adev)
> 
>   adev->firmware.load_type = amdgpu_ucode_get_load_type(adev,
> amdgpu_fw_load_type);
> 
> - adev->tmz.enabled = amdgpu_is_tmz(adev);
> + amdgpu_tmz_set(adev);
> 
>   return ret;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index a12f33c0f5df..a0245d8b2f37 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -333,3 +333,26 @@ void amdgpu_gmc_ras_fini(struct amdgpu_device
> *adev)
>   amdgpu_mmhub_ras_fini(adev);
>   amdgpu_xgmi_ras_fini(adev);
>  }
> +
> +/**
> + * amdgpu_tmz_set -- check and set if a device supports TMZ
> + * @adev: amdgpu_device pointer
> + *
> + * Check and set if an the device @adev supports Trusted Memory
> + * Zones (TMZ).
> + */
> +void amdgpu_tmz_set(struct amdgpu_device *adev) {
> + if (!amdgpu_tmz)
> + return;
> +
> + if (adev->asic_type < CHIP_RAVEN ||
> + adev->asic_type == CHIP_ARCTURUS) {
> + dev_warn(adev->dev, "Trusted Memory Zone (TMZ) feature
> not supported\n");
> + return;
> + }
> +
> + adev->tmz.enabled = true;
> +
> + dev_info(adev->dev, "Trusted Memory Zone (TMZ) feature
> supported and
> +enabled\n"); }

Hi Luben,
TMZ is just a specific feature and I think this is a nice change that moving 
amdgpu_tmz to amdgpu_gmc.h.
Another thing, you can rename amdgpu_tmz_set to amdgpu_gmc_tmz_set in 
amdgpu_gmc.h/ amdgpu_gmc.c
In amdgpu_gmc.c, all functions are prefixed with

Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Yuan, Xiaojie

Hi Evan,

Yes, I know this commit, but the reason for csb corruption is different.
The fix you mentioned is for suspend code path when bo is not unpinned at fini 
time. Then it is not evicted (resides in vram) and corrupts because of vram 
power down.
For gpu reset code path, bos are not evicted, so we need to re-init csb.

BR,
Xiaojie

> On Nov 21, 2019, at 9:28 AM, Quan, Evan  wrote:
> 
> Hmm, this seems an old issue which was found on gfx v9.
> 
>drm/amdgpu: pin the csb buffer on hw init v2
> 
>Without this pin, the csb buffer will be filled with inconsistent
>data after S3 resume. And that will causes gfx hang on gfxoff
>exit since this csb will be executed then.
> 
>v2: fit amdgpu_bo_pin change(take one less argument)
> 
> Maybe we need to find out a solution suitable for all ASICs.
> 
> Regards,
> Evan
> -Original Message-
> From: amd-gfx  On Behalf Of Yuan, 
> Xiaojie
> Sent: Wednesday, November 20, 2019 5:10 PM
> To: Zhang, Hawking ; Koenig, Christian 
> ; amd-gfx@lists.freedesktop.org
> Cc: Long, Gang ; Xiao, Jack ; Ma, Le 
> 
> Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
> reset
> 
> Got it.
> Thanks Hawking and Chris for your attention.
> 
> BR,
> Xiaojie
> 
> 
> From: Zhang, Hawking 
> Sent: Wednesday, November 20, 2019 5:04 PM
> To: Yuan, Xiaojie; Koenig, Christian; amd-gfx@lists.freedesktop.org
> Cc: Long, Gang; Xiao, Jack; Ma, Le
> Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
> reset
> 
> I'm okay with the re-initialize and it's more straightforward approach.
> 
> Regards,
> Hawking
> -Original Message-
> From: Yuan, Xiaojie 
> Sent: 2019年11月20日 17:00
> To: Zhang, Hawking ; Koenig, Christian 
> ; amd-gfx@lists.freedesktop.org
> Cc: Long, Gang ; Xiao, Jack 
> Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
> reset
> 
> I can try this approach as well.
> This csb is similar to the kiq mqd which are allocated in vram and might 
> corrupt after baco reset.
> 
> BR,
> Xiaojie
> 
> 
> From: Zhang, Hawking 
> Sent: Wednesday, November 20, 2019 4:54 PM
> To: Koenig, Christian; Yuan, Xiaojie; amd-gfx@lists.freedesktop.org
> Cc: Long, Gang; Xiao, Jack
> Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
> reset
> 
> I think we should evict the bo and then move it back.
> 
> Regards,
> Hawking
> 
> -Original Message-
> From: Christian König 
> Sent: 2019年11月20日 16:47
> To: Zhang, Hawking ; Yuan, Xiaojie 
> ; amd-gfx@lists.freedesktop.org
> Cc: Long, Gang ; Xiao, Jack 
> Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
> reset
> 
> A baco reset also resets the MC, doesn't it? n this case it would be expected 
> that the content of VRAM is corrupted.
> 
> Christian.
> 
>> Am 20.11.19 um 09:45 schrieb Zhang, Hawking:
>> Or in another word, we are still not clear when the corruption actually 
>> happens, right?
>> 
>> Regards,
>> Hawking
>> -Original Message-
>> From: amd-gfx  On Behalf Of 
>> Zhang, Hawking
>> Sent: 2019年11月20日 16:44
>> To: Yuan, Xiaojie ; 
>> amd-gfx@lists.freedesktop.org
>> Cc: Long, Gang ; Xiao, Jack 
>> Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer 
>> after gpu reset
>> 
>> Just make sure I understand you correctly. So until fw team root cause the 
>> reason of csb corruption, we keep the workaround in driver, correct?
>> 
>> Regards,
>> Hawking
>> -Original Message-
>> From: Yuan, Xiaojie 
>> Sent: 2019年11月20日 14:47
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Zhang, Hawking ; Xiao, Jack 
>> ; Long, Gang ; Yuan, Xiaojie 
>> 
>> Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after 
>> gpu reset
>> 
>> This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.
>> 
>> clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
>> gfxoff exit, CPF gets garbage header in CSIB and hangs.
>> 
>> Signed-off-by: Xiaojie Yuan 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
>>  1 file changed, 37 insertions(+), 6 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> index 9274bd4b6c68..8e24ea08ca39 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>> @@ -1789,27 +1789,52 @@ static void 
>> gfx_v10_0_enable_gui_idle_interrupt(struct amdgpu_device *adev,
>>  WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }
>> 
>> -static void gfx_v10_0_init_csb(struct amdgpu_device *adev)
>> +static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
>>  {
>> + int r;
>> +
>> + if (adev->in_gpu_reset) {
>> + r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
>> + if (r)
>> + return r;
>> +
>> + r =

RE: [PATCH] drm/amdgpu: simplify runtime suspend

2019-11-20 Thread Quan, Evan

Acked-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Thursday, November 21, 2019 3:23 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu: simplify runtime suspend
> 
> In the standard _PR3 case, the pci core handles the pci state.
> The driver only needs to handle it in the legacy ATPX case.
> 
> This may fix issues with runtime suspend/resume on certain
> hybrid graphics laptops.
> 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 35 -
>  1 file changed, 22 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index ea763bc5b233..df2f4720a2f0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1227,13 +1227,17 @@ static int amdgpu_pmops_runtime_suspend(struct
> device *dev)
> 
>   ret = amdgpu_device_suspend(drm_dev, false, false);
>   if (amdgpu_device_supports_boco(drm_dev)) {
> - pci_save_state(pdev);
> - pci_disable_device(pdev);
> - pci_ignore_hotplug(pdev);
> - if (amdgpu_is_atpx_hybrid())
> + /* Only need to handle PCI state in the driver for ATPX
> +  * PCI core handles it for _PR3.
> +  */
> + if (amdgpu_is_atpx_hybrid()) {
> + pci_ignore_hotplug(pdev);
> + } else {
> + pci_save_state(pdev);
> + pci_disable_device(pdev);
> + pci_ignore_hotplug(pdev);
>   pci_set_power_state(pdev, PCI_D3cold);
> - else if (!amdgpu_has_atpx_dgpu_power_cntl())
> - pci_set_power_state(pdev, PCI_D3hot);
> + }
>   drm_dev->switch_power_state =
> DRM_SWITCH_POWER_DYNAMIC_OFF;
>   } else if (amdgpu_device_supports_baco(drm_dev)) {
>   amdgpu_device_baco_enter(drm_dev);
> @@ -1255,14 +1259,19 @@ static int amdgpu_pmops_runtime_resume(struct
> device *dev)
>   if (amdgpu_device_supports_boco(drm_dev)) {
>   drm_dev->switch_power_state =
> DRM_SWITCH_POWER_CHANGING;
> 
> - if (amdgpu_is_atpx_hybrid() ||
> - !amdgpu_has_atpx_dgpu_power_cntl())
> + /* Only need to handle PCI state in the driver for ATPX
> +  * PCI core handles it for _PR3.
> +  */
> + if (amdgpu_is_atpx_hybrid()) {
> + pci_set_master(pdev);
> + } else {
>   pci_set_power_state(pdev, PCI_D0);
> - pci_restore_state(pdev);
> - ret = pci_enable_device(pdev);
> - if (ret)
> - return ret;
> - pci_set_master(pdev);
> + pci_restore_state(pdev);
> + ret = pci_enable_device(pdev);
> + if (ret)
> + return ret;
> + pci_set_master(pdev);
> + }
>   } else if (amdgpu_device_supports_baco(drm_dev)) {
>   amdgpu_device_baco_exit(drm_dev);
>   }
> --
> 2.23.0
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: implement TMZ accessor (v2)

2019-11-20 Thread Luben Tuikov

Implement an accessor of adev->tmz.enabled. Let not
code around access it as "if (adev->tmz.enabled)"
as the organization may change. Instead...

Recruit "bool amdgpu_is_tmz(adev)" to return
exactly this Boolean value. That is, this function
is now an accessor of an already initialized and
set adev and adev->tmz.

Add "void amdgpu_tmz_set(adev)" to check and set
adev->tmz.* at initialization time. After which
one uses "bool amdgpu_is_tmz(adev)" to query
whether adev supports TMZ.

Also, remove circular header file include.

v2: Remove amdgpu_tmz.[ch] as requested.

Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/Makefile|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c| 23 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h|  9 
 drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.c| 52 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.h| 39 
 7 files changed, 39 insertions(+), 95 deletions(-)
 delete mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.c
 delete mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 83ee1c676e3a..7ae3b22c5628 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -55,7 +55,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
amdgpu_vf_error.o amdgpu_sched.o amdgpu_debugfs.o amdgpu_ids.o \
amdgpu_gmc.o amdgpu_mmhub.o amdgpu_xgmi.o amdgpu_csa.o amdgpu_ras.o 
amdgpu_vm_cpu.o \
amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \
-   amdgpu_umc.o smu_v11_0_i2c.o amdgpu_tmz.o
+   amdgpu_umc.o smu_v11_0_i2c.o
 
 amdgpu-$(CONFIG_PERF_EVENTS) += amdgpu_pmu.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d120fe58ebea..805e12ef13ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -90,7 +90,6 @@
 #include "amdgpu_mes.h"
 #include "amdgpu_umc.h"
 #include "amdgpu_mmhub.h"
-#include "amdgpu_tmz.h"
 
 #define MAX_GPU_INSTANCE   16
 
@@ -1266,5 +1265,10 @@ _name##_show(struct device *dev, 
\
\
 static struct device_attribute pmu_attr_##_name = __ATTR_RO(_name)
 
+static inline bool amdgpu_is_tmz(struct amdgpu_device *adev)
+{
+   return adev->tmz.enabled;
+}
+
 #endif
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index b1408c5e4640..56836054e6a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -64,7 +64,6 @@
 #include "amdgpu_xgmi.h"
 #include "amdgpu_ras.h"
 #include "amdgpu_pmu.h"
-#include "amdgpu_tmz.h"
 
 #include 
 
@@ -1073,7 +1072,7 @@ static int amdgpu_device_check_arguments(struct 
amdgpu_device *adev)
 
adev->firmware.load_type = amdgpu_ucode_get_load_type(adev, 
amdgpu_fw_load_type);
 
-   adev->tmz.enabled = amdgpu_is_tmz(adev);
+   amdgpu_tmz_set(adev);
 
return ret;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index a12f33c0f5df..a0245d8b2f37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -333,3 +333,26 @@ void amdgpu_gmc_ras_fini(struct amdgpu_device *adev)
amdgpu_mmhub_ras_fini(adev);
amdgpu_xgmi_ras_fini(adev);
 }
+
+/**
+ * amdgpu_tmz_set -- check and set if a device supports TMZ
+ * @adev: amdgpu_device pointer
+ *
+ * Check and set if an the device @adev supports Trusted Memory
+ * Zones (TMZ).
+ */
+void amdgpu_tmz_set(struct amdgpu_device *adev)
+{
+   if (!amdgpu_tmz)
+   return;
+
+   if (adev->asic_type < CHIP_RAVEN ||
+   adev->asic_type == CHIP_ARCTURUS) {
+   dev_warn(adev->dev, "Trusted Memory Zone (TMZ) feature not 
supported\n");
+   return;
+   }
+
+   adev->tmz.enabled = true;
+
+   dev_info(adev->dev, "Trusted Memory Zone (TMZ) feature supported and 
enabled\n");
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 406736a1bd3d..1abd935a073e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -267,4 +267,13 @@ bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev, 
uint64_t addr,
 int amdgpu_gmc_ras_late_init(struct amdgpu_device *adev);
 void amdgpu_gmc_ras_fini(struct amdgpu_device *adev);
 
+/*
+ * Trusted Memory Zone particulars
+ */
+struct amdgpu_tmz {
+   boolenabled;
+};
+
+extern void amdgpu_tmz_set(struct amdgpu_device *adev);
+
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_tmz.c
deleted file mode 100644
index

RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Quan, Evan

Hmm, this seems an old issue which was found on gfx v9.

drm/amdgpu: pin the csb buffer on hw init v2

Without this pin, the csb buffer will be filled with inconsistent
data after S3 resume. And that will causes gfx hang on gfxoff
exit since this csb will be executed then.

v2: fit amdgpu_bo_pin change(take one less argument)

Maybe we need to find out a solution suitable for all ASICs.

Regards,
Evan
-Original Message-
From: amd-gfx  On Behalf Of Yuan, Xiaojie
Sent: Wednesday, November 20, 2019 5:10 PM
To: Zhang, Hawking ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack ; Ma, Le 

Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

Got it.
Thanks Hawking and Chris for your attention.

BR,
Xiaojie


From: Zhang, Hawking 
Sent: Wednesday, November 20, 2019 5:04 PM
To: Yuan, Xiaojie; Koenig, Christian; amd-gfx@lists.freedesktop.org
Cc: Long, Gang; Xiao, Jack; Ma, Le
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

I'm okay with the re-initialize and it's more straightforward approach.

Regards,
Hawking
-Original Message-
From: Yuan, Xiaojie 
Sent: 2019年11月20日 17:00
To: Zhang, Hawking ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

I can try this approach as well.
This csb is similar to the kiq mqd which are allocated in vram and might 
corrupt after baco reset.

BR,
Xiaojie


From: Zhang, Hawking 
Sent: Wednesday, November 20, 2019 4:54 PM
To: Koenig, Christian; Yuan, Xiaojie; amd-gfx@lists.freedesktop.org
Cc: Long, Gang; Xiao, Jack
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

I think we should evict the bo and then move it back.

Regards,
Hawking

-Original Message-
From: Christian König 
Sent: 2019年11月20日 16:47
To: Zhang, Hawking ; Yuan, Xiaojie 
; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

A baco reset also resets the MC, doesn't it? n this case it would be expected 
that the content of VRAM is corrupted.

Christian.

Am 20.11.19 um 09:45 schrieb Zhang, Hawking:
> Or in another word, we are still not clear when the corruption actually 
> happens, right?
>
> Regards,
> Hawking
> -Original Message-
> From: amd-gfx  On Behalf Of 
> Zhang, Hawking
> Sent: 2019年11月20日 16:44
> To: Yuan, Xiaojie ; 
> amd-gfx@lists.freedesktop.org
> Cc: Long, Gang ; Xiao, Jack 
> Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer 
> after gpu reset
>
> Just make sure I understand you correctly. So until fw team root cause the 
> reason of csb corruption, we keep the workaround in driver, correct?
>
> Regards,
> Hawking
> -Original Message-
> From: Yuan, Xiaojie 
> Sent: 2019年11月20日 14:47
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Xiao, Jack 
> ; Long, Gang ; Yuan, Xiaojie 
> 
> Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after 
> gpu reset
>
> This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.
>
> clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
> gfxoff exit, CPF gets garbage header in CSIB and hangs.
>
> Signed-off-by: Xiaojie Yuan 
> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
>   1 file changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 9274bd4b6c68..8e24ea08ca39 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -1789,27 +1789,52 @@ static void 
> gfx_v10_0_enable_gui_idle_interrupt(struct amdgpu_device *adev,
>   WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }
>
> -static void gfx_v10_0_init_csb(struct amdgpu_device *adev)
> +static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
>   {
> + int r;
> +
> + if (adev->in_gpu_reset) {
> + r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
> + if (r)
> + return r;
> +
> + r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
> +(void **)>gfx.rlc.cs_ptr);
> + if (!r) {
> + adev->gfx.rlc.funcs->get_csb_buffer(adev,
> + adev->gfx.rlc.cs_ptr);
> + amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
> + }
> +
> + amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
> + if (r)
> + return r;
> + }
> +
>   /* csib */
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
>adev->gfx.rlc.clear_state_gpu_addr >> 32);
>   WREG32_SOC15(GC, 0,

[pull] amdgpu drm-fixes-5.4

2019-11-20 Thread Alex Deucher

Hi Dave, Daniel,

A few fixes for 5.4.  Stability and regression fixes. Most CCed to stable.

The following changes since commit af42d3466bdc8f39806b26f593604fdc54140bcb:

  Linux 5.4-rc8 (2019-11-17 14:47:30 -0800)

are available in the Git repository at:

  git://people.freedesktop.org/~agd5f/linux tags/drm-fixes-5.4-2019-11-20

for you to fetch changes up to a0184d71163aab258d73141a8839675d6cbdcf40:

  Revert "drm/amd/display: enable S/G for RAVEN chip" (2019-11-20 18:40:21 
-0500)


drm-fixes-5.4-2019-11-20:

amdgpu:
- Remove experimental flag for navi14
- Fix confusing power message failures on older VI parts
- Hang fix for gfxoff when using the read register interface
- Two stability regression fixes for Raven


Alex Deucher (4):
  drm/amdgpu: remove experimental flag for Navi14
  drm/amdgpu: disable gfxoff when using register read interface
  drm/amdgpu: disable gfxoff on original raven
  Revert "drm/amd/display: enable S/G for RAVEN chip"

Evan Quan (2):
  drm/amd/powerplay: issue no PPSMC_MSG_GetCurrPkgPwr on unsupported ASICs
  drm/amd/powerplay: correct fine grained dpm force level setting

 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   |  8 
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  6 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c |  9 +++--
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  2 +-
 drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c  | 23 ++-
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c|  6 ++
 7 files changed, 42 insertions(+), 14 deletions(-)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: re-enable wait in pipelock, but add timeout

2019-11-20 Thread Alex Deucher

Ping?

On Fri, Nov 15, 2019 at 11:01 AM Alex Deucher  wrote:
>
> Removing this causes hangs in some games, so re-add it, but add
> a timeout so we don't hang while switching flip types.
>
> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=205169
> Bug: https://bugs.freedesktop.org/show_bug.cgi?id=112266
> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=205169
> Signed-off-by: Alex Deucher 
> ---
>  .../drm/amd/display/dc/dcn20/dcn20_hwseq.c| 19 +++
>  1 file changed, 19 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c 
> b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
> index 0046a099c9e9..120e62c5193c 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
> @@ -1033,6 +1033,25 @@ void dcn20_pipe_control_lock(
> if (pipe->plane_state != NULL)
> flip_immediate = pipe->plane_state->flip_immediate;
>
> +   if (flip_immediate && lock) {
> +   const int TIMEOUT_FOR_FLIP_PENDING = 10;
> +   int i;
> +
> +   for (i = 0; i < TIMEOUT_FOR_FLIP_PENDING; ++i) {
> +   if 
> (!pipe->plane_res.hubp->funcs->hubp_is_flip_pending(pipe->plane_res.hubp))
> +   break;
> +   udelay(1);
> +   }
> +
> +   if (pipe->bottom_pipe != NULL) {
> +   for (i = 0; i < TIMEOUT_FOR_FLIP_PENDING; ++i) {
> +   if 
> (!pipe->bottom_pipe->plane_res.hubp->funcs->hubp_is_flip_pending(pipe->bottom_pipe->plane_res.hubp))
> +   break;
> +   udelay(1);
> +   }
> +   }
> +   }
> +
> /* In flip immediate and pipe splitting case, we need to use GSL
>  * for synchronization. Only do setup on locking and on flip type 
> change.
>  */
> --
> 2.23.0
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: simplify padding calculations (v2) (REPOST)

2019-11-20 Thread Luben Tuikov

Simplify padding calculations.

v2: Comment update and spacing.

Signed-off-by: Luben Tuikov 
---
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c  |  4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c |  4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c |  4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  4 ++--
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 17 -
 5 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c 
b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
index 82cdb8f57bfd..909c9c0b4233 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
@@ -228,7 +228,7 @@ static void cik_sdma_ring_emit_ib(struct amdgpu_ring *ring,
u32 extra_bits = vmid & 0xf;
 
/* IB packet must end on a 8 DW boundary */
-   cik_sdma_ring_insert_nop(ring, (12 - (lower_32_bits(ring->wptr) & 7)) % 
8);
+   cik_sdma_ring_insert_nop(ring, (4 - lower_32_bits(ring->wptr)) & 7);
 
amdgpu_ring_write(ring, SDMA_PACKET(SDMA_OPCODE_INDIRECT_BUFFER, 0, 
extra_bits));
amdgpu_ring_write(ring, ib->gpu_addr & 0xffe0); /* base must be 32 
byte aligned */
@@ -811,7 +811,7 @@ static void cik_sdma_ring_pad_ib(struct amdgpu_ring *ring, 
struct amdgpu_ib *ib)
u32 pad_count;
int i;
 
-   pad_count = (8 - (ib->length_dw & 0x7)) % 8;
+   pad_count = (-ib->length_dw) & 7;
for (i = 0; i < pad_count; i++)
if (sdma && sdma->burst_nop && (i == 0))
ib->ptr[ib->length_dw++] =
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
index 89e8c74a40f4..1c49f78b2014 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c
@@ -255,7 +255,7 @@ static void sdma_v2_4_ring_emit_ib(struct amdgpu_ring *ring,
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
 
/* IB packet must end on a 8 DW boundary */
-   sdma_v2_4_ring_insert_nop(ring, (10 - (lower_32_bits(ring->wptr) & 7)) 
% 8);
+   sdma_v2_4_ring_insert_nop(ring, (2 - lower_32_bits(ring->wptr)) & 7);
 
amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_INDIRECT) |
  SDMA_PKT_INDIRECT_HEADER_VMID(vmid & 0xf));
@@ -750,7 +750,7 @@ static void sdma_v2_4_ring_pad_ib(struct amdgpu_ring *ring, 
struct amdgpu_ib *ib
u32 pad_count;
int i;
 
-   pad_count = (8 - (ib->length_dw & 0x7)) % 8;
+   pad_count = (-ib->length_dw) & 7;
for (i = 0; i < pad_count; i++)
if (sdma && sdma->burst_nop && (i == 0))
ib->ptr[ib->length_dw++] =
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
index 011fd12c41fe..db752f2e05f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
@@ -429,7 +429,7 @@ static void sdma_v3_0_ring_emit_ib(struct amdgpu_ring *ring,
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
 
/* IB packet must end on a 8 DW boundary */
-   sdma_v3_0_ring_insert_nop(ring, (10 - (lower_32_bits(ring->wptr) & 7)) 
% 8);
+   sdma_v3_0_ring_insert_nop(ring, (2 - lower_32_bits(ring->wptr)) & 7);
 
amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_INDIRECT) |
  SDMA_PKT_INDIRECT_HEADER_VMID(vmid & 0xf));
@@ -1021,7 +1021,7 @@ static void sdma_v3_0_ring_pad_ib(struct amdgpu_ring 
*ring, struct amdgpu_ib *ib
u32 pad_count;
int i;
 
-   pad_count = (8 - (ib->length_dw & 0x7)) % 8;
+   pad_count = (-ib->length_dw) & 7;
for (i = 0; i < pad_count; i++)
if (sdma && sdma->burst_nop && (i == 0))
ib->ptr[ib->length_dw++] =
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index d209cbf54e7d..7d8c2ec7df5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -698,7 +698,7 @@ static void sdma_v4_0_ring_emit_ib(struct amdgpu_ring *ring,
unsigned vmid = AMDGPU_JOB_GET_VMID(job);
 
/* IB packet must end on a 8 DW boundary */
-   sdma_v4_0_ring_insert_nop(ring, (10 - (lower_32_bits(ring->wptr) & 7)) 
% 8);
+   sdma_v4_0_ring_insert_nop(ring, (2 - lower_32_bits(ring->wptr)) & 7);
 
amdgpu_ring_write(ring, SDMA_PKT_HEADER_OP(SDMA_OP_INDIRECT) |
  SDMA_PKT_INDIRECT_HEADER_VMID(vmid & 0xf));
@@ -1580,7 +1580,7 @@ static void sdma_v4_0_ring_pad_ib(struct amdgpu_ring 
*ring, struct amdgpu_ib *ib
u32 pad_count;
int i;
 
-   pad_count = (8 - (ib->length_dw & 0x7)) % 8;
+   pad_count = (-ib->length_dw) & 7;
for (i = 0; i < pad_count; i++)
if (sdma && sdma->burst_nop && (i == 0))
ib->ptr[ib->length_dw++] =
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 64c53eed7fac..51ebb2147ee7 100644
---

RE: [PATCH] drm/amdkfd: DIQ should not use HIQ way to allocate memory

2019-11-20 Thread Zeng, Oak

Thanks for capture this. This is something I forget to do when I re-orged the 
mqd manager codes.

Reviewed-by: Oak Zeng 

Oak

-Original Message-
From: amd-gfx  On Behalf Of Yong Zhao
Sent: Friday, November 15, 2019 11:09 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhao, Yong 
Subject: [PATCH] drm/amdkfd: DIQ should not use HIQ way to allocate memory

In the mqd_diq_sdma buffer, there should be only one HIQ mqd. All DIQs should 
be allocate using the regular way.

Change-Id: Ibf3eb33604d0ec30501c244228cdb3b24615b699
Signed-off-by: Yong Zhao 
---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 2 +-  
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c | 2 +-  
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c  | 2 +-  
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c  | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
index b08694ec65d7..19f0fe547c57 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
@@ -400,7 +400,7 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE 
type,  #endif
break;
case KFD_MQD_TYPE_DIQ:
-   mqd->allocate_mqd = allocate_hiq_mqd;
+   mqd->allocate_mqd = allocate_mqd;
mqd->init_mqd = init_mqd_hiq;
mqd->free_mqd = free_mqd;
mqd->load_mqd = load_mqd;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
index 5a0e30441be8..8d21325b5cbb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
@@ -432,7 +432,7 @@ struct mqd_manager *mqd_manager_init_v10(enum KFD_MQD_TYPE 
type,
pr_debug("%s@%i\n", __func__, __LINE__);
break;
case KFD_MQD_TYPE_DIQ:
-   mqd->allocate_mqd = allocate_hiq_mqd;
+   mqd->allocate_mqd = allocate_mqd;
mqd->init_mqd = init_mqd_hiq;
mqd->free_mqd = free_mqd;
mqd->load_mqd = load_mqd;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index bdbcea22ad12..df77d67ec9aa 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -471,7 +471,7 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE 
type,  #endif
break;
case KFD_MQD_TYPE_DIQ:
-   mqd->allocate_mqd = allocate_hiq_mqd;
+   mqd->allocate_mqd = allocate_mqd;
mqd->init_mqd = init_mqd_hiq;
mqd->free_mqd = free_mqd;
mqd->load_mqd = load_mqd;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
index c9e1151b5a57..3b6b5671964c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
@@ -452,7 +452,7 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE 
type,  #endif
break;
case KFD_MQD_TYPE_DIQ:
-   mqd->allocate_mqd = allocate_hiq_mqd;
+   mqd->allocate_mqd = allocate_mqd;
mqd->init_mqd = init_mqd_hiq;
mqd->free_mqd = free_mqd;
mqd->load_mqd = load_mqd;
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/display: Change mmhub_9_4_0_ headers to mmhub_1_0_ ones.

2019-11-20 Thread Deucher, Alexander

Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Zhan Liu 

Sent: Wednesday, November 20, 2019 2:43 PM
To: amd-gfx@lists.freedesktop.org ; Liu, Zhan 
; Li, Dennis ; Zhang, Hawking 
; Zhou1, Tao 
Subject: [PATCH] drm/amd/display: Change mmhub_9_4_0_ headers to mmhub_1_0_ 
ones.

[Why]
Kernal won't compile without this patch. That is because
mmhub_9_4_0_ headers are obsolete. All contents within
mmhub_9_4_0_ headers are inherited by their corresponding
mmhub_1_0_ ones.

[How]
Change mmhub_9_4_0_ headers to their corresponding mmhub_1_0_ ones.

Signed-off-by: Zhan Liu 
---
 drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c 
b/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
index 305bb0841563..53ab88ef71f5 100644
--- a/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
@@ -63,8 +63,8 @@
 #include "soc15_hw_ip.h"
 #include "vega10_ip_offset.h"
 #include "nbio/nbio_6_1_offset.h"
-#include "mmhub/mmhub_9_4_0_offset.h"
-#include "mmhub/mmhub_9_4_0_sh_mask.h"
+#include "mmhub/mmhub_1_0_offset.h"
+#include "mmhub/mmhub_1_0_sh_mask.h"
 #include "reg_helper.h"

 #include "dce100/dce100_resource.h"
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/display: Change mmhub_9_4_0_ headers to mmhub_1_0_ ones.

2019-11-20 Thread Zhan Liu

[Why]
Kernal won't compile without this patch. That is because
mmhub_9_4_0_ headers are obsolete. All contents within
mmhub_9_4_0_ headers are inherited by their corresponding
mmhub_1_0_ ones.

[How]
Change mmhub_9_4_0_ headers to their corresponding mmhub_1_0_ ones.

Signed-off-by: Zhan Liu 
---
 drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c 
b/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
index 305bb0841563..53ab88ef71f5 100644
--- a/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
@@ -63,8 +63,8 @@
 #include "soc15_hw_ip.h"
 #include "vega10_ip_offset.h"
 #include "nbio/nbio_6_1_offset.h"
-#include "mmhub/mmhub_9_4_0_offset.h"
-#include "mmhub/mmhub_9_4_0_sh_mask.h"
+#include "mmhub/mmhub_1_0_offset.h"
+#include "mmhub/mmhub_1_0_sh_mask.h"
 #include "reg_helper.h"
 
 #include "dce100/dce100_resource.h"
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/psp: declare PSP TA firmware

2019-11-20 Thread Alex Deucher

It's optional and only used for HDCP at the moment.  I'll put it up anyway.

Alex

On Wed, Nov 20, 2019 at 7:36 AM Mike Lothian  wrote:
>
> Hi
>
> Any idea where I can find those blobs?
>
> Cheers
>
> Mike
>
> On Wed, 16 Oct 2019 at 03:48, chen gong  wrote:
> >
> > Add PSP TA firmware declaration for raven raven2 picasso
> >
> > Signed-off-by: chen gong 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c 
> > b/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
> > index b96484a..b345e69 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
> > @@ -40,6 +40,9 @@
> >  MODULE_FIRMWARE("amdgpu/raven_asd.bin");
> >  MODULE_FIRMWARE("amdgpu/picasso_asd.bin");
> >  MODULE_FIRMWARE("amdgpu/raven2_asd.bin");
> > +MODULE_FIRMWARE("amdgpu/picasso_ta.bin");
> > +MODULE_FIRMWARE("amdgpu/raven2_ta.bin");
> > +MODULE_FIRMWARE("amdgpu/raven_ta.bin");
> >
> >  static int psp_v10_0_init_microcode(struct psp_context *psp)
> >  {
> > --
> > 2.7.4
> >
> > ___
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: simplify runtime suspend

2019-11-20 Thread Alex Deucher

In the standard _PR3 case, the pci core handles the pci state.
The driver only needs to handle it in the legacy ATPX case.

This may fix issues with runtime suspend/resume on certain
hybrid graphics laptops.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 35 -
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ea763bc5b233..df2f4720a2f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1227,13 +1227,17 @@ static int amdgpu_pmops_runtime_suspend(struct device 
*dev)
 
ret = amdgpu_device_suspend(drm_dev, false, false);
if (amdgpu_device_supports_boco(drm_dev)) {
-   pci_save_state(pdev);
-   pci_disable_device(pdev);
-   pci_ignore_hotplug(pdev);
-   if (amdgpu_is_atpx_hybrid())
+   /* Only need to handle PCI state in the driver for ATPX
+* PCI core handles it for _PR3.
+*/
+   if (amdgpu_is_atpx_hybrid()) {
+   pci_ignore_hotplug(pdev);
+   } else {
+   pci_save_state(pdev);
+   pci_disable_device(pdev);
+   pci_ignore_hotplug(pdev);
pci_set_power_state(pdev, PCI_D3cold);
-   else if (!amdgpu_has_atpx_dgpu_power_cntl())
-   pci_set_power_state(pdev, PCI_D3hot);
+   }
drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF;
} else if (amdgpu_device_supports_baco(drm_dev)) {
amdgpu_device_baco_enter(drm_dev);
@@ -1255,14 +1259,19 @@ static int amdgpu_pmops_runtime_resume(struct device 
*dev)
if (amdgpu_device_supports_boco(drm_dev)) {
drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
 
-   if (amdgpu_is_atpx_hybrid() ||
-   !amdgpu_has_atpx_dgpu_power_cntl())
+   /* Only need to handle PCI state in the driver for ATPX
+* PCI core handles it for _PR3.
+*/
+   if (amdgpu_is_atpx_hybrid()) {
+   pci_set_master(pdev);
+   } else {
pci_set_power_state(pdev, PCI_D0);
-   pci_restore_state(pdev);
-   ret = pci_enable_device(pdev);
-   if (ret)
-   return ret;
-   pci_set_master(pdev);
+   pci_restore_state(pdev);
+   ret = pci_enable_device(pdev);
+   if (ret)
+   return ret;
+   pci_set_master(pdev);
+   }
} else if (amdgpu_device_supports_baco(drm_dev)) {
amdgpu_device_baco_exit(drm_dev);
}
-- 
2.23.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 01/12] amdgpu: add UAPI for creating encrypted buffers

2019-11-20 Thread Luben Tuikov

On 2019-11-20 13:40, Christian König wrote:
> Am 20.11.19 um 18:50 schrieb Luben Tuikov:
>> On 2019-11-20 12:24, Christian König wrote:
>>> Am 20.11.19 um 18:16 schrieb Christian König:
 Am 20.11.19 um 17:49 schrieb Luben Tuikov:
> On 2019-11-19 21:41, Marek Olšák wrote:
>> On Tue, Nov 19, 2019 at 8:52 PM Luben Tuikov > > wrote:
>>
>>   On 2019-11-14 10:34 p.m., Aaron Liu wrote:
>>   > From: Huang Rui mailto:ray.hu...@amd.com>>
>>   >
>>   > To align the kernel uapi change from Alex:
>>   >
>>   > "Add a flag to the GEM_CREATE ioctl to create encrypted
>> buffers. Buffers with
>>   > this flag set will be created with the TMZ bit set in the
>> PTEs or engines
>>   > accessing them. This is required in order to properly access
>> the data from the
>>   > engines."
>>   >
>>   > We will use GEM_CREATE_ENCRYPTED flag for secure buffer
>> allocation.
>>   >
>>   > Signed-off-by: Huang Rui > >
>>   > Reviewed-by: Alex Deucher > >
>>   > ---
>>   >  include/drm/amdgpu_drm.h | 5 +
>>   >  1 file changed, 5 insertions(+)
>>   >
>>   > diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
>>   > index 5c28aa7..1a95e37 100644
>>   > --- a/include/drm/amdgpu_drm.h
>>   > +++ b/include/drm/amdgpu_drm.h
>>   > @@ -141,6 +141,11 @@ extern "C" {
>>   >   * releasing the memory
>>   >   */
>>   >  #define AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE  (1 << 9)
>>   > +/* Flag that BO will be encrypted and that the TMZ bit
>> should be
>>   > + * set in the PTEs when mapping this buffer via GPUVM or
>>   > + * accessing it with various hw blocks
>>   > + */
>>   > +#define AMDGPU_GEM_CREATE_ENCRYPTED          (1 << 10)
>>
>>   Style!
>>   TAB char?!
>>
>>   You have a TAB char between ".._ENCRYPTED" and "(1 << 10)"
>>   Do NOT add/insert TAB chars instead of space to align colunmns!
>>   If when you press Tab key a tab is inserted, as opposed to the
>> line
>>   indented, then DO NOT use this editor.
>>   The Tab key should "indent according to mode" by inserting TAB
>> chars.
>>   If the line is already indented, as this one is, then it should
>> do nothing.
>>
>>
>> I disagree with this 100%. Tabs or spaces don't matter here from my
>> perspective. I also disagree with your language. It's overly impolite.
> But it's the coding style of Linux: leading tabs only. Try it with
> Emacs as described and given in
>
> linux/Documentation/process/coding-style.rst
>
> starting at line 589. And press the Tab key on an already indented
> line--nothing will happen. Linux has traditionally
> shunned from loose TAB chars in already indented lines: leading tabs
> only mode. In a proper code editor
> pressing the Tab key only indents according to buffer mode, it
> shouldn't insert a Tab char willy-nilly.
> People may set their tab stops differently for different tab
> positions and inserting a tab char may display
> incorrectly. The most portable way to align columns in an already
> indented-according-to-mode line, is
> using spaces. (Of course this doesn't matter when using spaces to
> indent, but Linux uses hard TAB chars
> to indent: linux/Documentation/process/coding-style.rst. (which also
> seem to be set to 8 chars))
>
> It's a code review, there is no "language".
 Well the section you noted also suggest to either get rid of emacs or
 change it to use some saner default values. We just got rid of emacs.
>> Yes, it says this, quote (for those who didn't open the file):
>>
>> --8<-
>>
>> That's OK, we all do.  You've probably been told by your long-time Unix
>> user helper that ``GNU emacs`` automatically formats the C sources for
>> you, and you've noticed that yes, it does do that, but the defaults it
>> uses are less than desirable (in fact, they are worse than random
>> typing - an infinite number of monkeys typing into GNU emacs would never
>> make a good program).
>>
>> So, you can either get rid of GNU emacs, or change it to use saner
>> values.  To do the latter, you can stick the following in your .emacs file:
>>
>> --8<
>>
 Regarding tabs after the initial indentation, I've just done a quick
 grep and around 14% of all defines under include/ uses that so I would
 say that this is perfectly fine.
>>> Fast typing with lazy eyes, that should read "around 71% of all defines".
>> Hmm, that's interesting. Is

[PATCH][next] drm/amd/display: fix double assignment to msg_id field

2019-11-20 Thread Colin King

From: Colin Ian King 

The msg_id field is being assigned twice. Fix this by replacing the second
assignment with an assignment to msg_size.

Addresses-Coverity: ("Unused value")
Fixes: 11a00965d261 ("drm/amd/display: Add PSP block to verify HDCP2.2 steps")
Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c 
b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c
index 2dd5feec8e6c..6791c5844e43 100644
--- a/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c
+++ b/drivers/gpu/drm/amd/display/modules/hdcp/hdcp_psp.c
@@ -42,7 +42,7 @@ static void hdcp2_message_init(struct mod_hdcp *hdcp,
in->process.msg2_desc.msg_id = TA_HDCP_HDCP2_MSG_ID__NULL_MESSAGE;
in->process.msg2_desc.msg_size = 0;
in->process.msg3_desc.msg_id = TA_HDCP_HDCP2_MSG_ID__NULL_MESSAGE;
-   in->process.msg3_desc.msg_id = 0;
+   in->process.msg3_desc.msg_size = 0;
 }
 enum mod_hdcp_status mod_hdcp_remove_display_topology(struct mod_hdcp *hdcp)
 {
-- 
2.24.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: AMD e9171 linux driver

2019-11-20 Thread Deucher, Alexander

Yes amdgpu will work.  the e9171 is polaris based.

Alex

From: amd-gfx  on behalf of Yusuf 
Altıparmak 
Sent: Wednesday, November 20, 2019 12:04 PM
To: amd-gfx@lists.freedesktop.org 
Subject: AMD e9171 linux driver

Hello,

I have simple question. I want to use e9171 with my embedded t1042d4rdb-64b 
board. Does linux kernel provide a driver for e9171? More specific, does 
'amdgpu'(https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/amd) 
provides driver for e9171 ?

Best regards
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 01/12] amdgpu: add UAPI for creating encrypted buffers

2019-11-20 Thread Christian König

Am 20.11.19 um 18:50 schrieb Luben Tuikov:

On 2019-11-20 12:24, Christian König wrote:

Am 20.11.19 um 18:16 schrieb Christian König:

Am 20.11.19 um 17:49 schrieb Luben Tuikov:

On 2019-11-19 21:41, Marek Olšák wrote:

On Tue, Nov 19, 2019 at 8:52 PM Luben Tuikov mailto:luben.tui...@amd.com>> wrote:

  On 2019-11-14 10:34 p.m., Aaron Liu wrote:
  > From: Huang Rui mailto:ray.hu...@amd.com>>
  >
  > To align the kernel uapi change from Alex:
  >
  > "Add a flag to the GEM_CREATE ioctl to create encrypted
buffers. Buffers with
  > this flag set will be created with the TMZ bit set in the
PTEs or engines
  > accessing them. This is required in order to properly access
the data from the
  > engines."
  >
  > We will use GEM_CREATE_ENCRYPTED flag for secure buffer
allocation.
  >
  > Signed-off-by: Huang Rui mailto:ray.hu...@amd.com>>
  > Reviewed-by: Alex Deucher mailto:alexander.deuc...@amd.com>>
  > ---
  >  include/drm/amdgpu_drm.h | 5 +
  >  1 file changed, 5 insertions(+)
  >
  > diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
  > index 5c28aa7..1a95e37 100644
  > --- a/include/drm/amdgpu_drm.h
  > +++ b/include/drm/amdgpu_drm.h
  > @@ -141,6 +141,11 @@ extern "C" {
  >   * releasing the memory
  >   */
  >  #define AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE  (1 << 9)
  > +/* Flag that BO will be encrypted and that the TMZ bit
should be
  > + * set in the PTEs when mapping this buffer via GPUVM or
  > + * accessing it with various hw blocks
  > + */
  > +#define AMDGPU_GEM_CREATE_ENCRYPTED          (1 << 10)

  Style!
  TAB char?!

  You have a TAB char between ".._ENCRYPTED" and "(1 << 10)"
  Do NOT add/insert TAB chars instead of space to align colunmns!
  If when you press Tab key a tab is inserted, as opposed to the
line
  indented, then DO NOT use this editor.
  The Tab key should "indent according to mode" by inserting TAB
chars.
  If the line is already indented, as this one is, then it should
do nothing.

I disagree with this 100%. Tabs or spaces don't matter here from my
perspective. I also disagree with your language. It's overly impolite.

But it's the coding style of Linux: leading tabs only. Try it with
Emacs as described and given in

linux/Documentation/process/coding-style.rst

starting at line 589. And press the Tab key on an already indented
line--nothing will happen. Linux has traditionally
shunned from loose TAB chars in already indented lines: leading tabs
only mode. In a proper code editor
pressing the Tab key only indents according to buffer mode, it
shouldn't insert a Tab char willy-nilly.
People may set their tab stops differently for different tab
positions and inserting a tab char may display
incorrectly. The most portable way to align columns in an already
indented-according-to-mode line, is
using spaces. (Of course this doesn't matter when using spaces to
indent, but Linux uses hard TAB chars
to indent: linux/Documentation/process/coding-style.rst. (which also
seem to be set to 8 chars))

It's a code review, there is no "language".

Well the section you noted also suggest to either get rid of emacs or
change it to use some saner default values. We just got rid of emacs.

Yes, it says this, quote (for those who didn't open the file):

--8<-

That's OK, we all do.  You've probably been told by your long-time Unix
user helper that ``GNU emacs`` automatically formats the C sources for
you, and you've noticed that yes, it does do that, but the defaults it
uses are less than desirable (in fact, they are worse than random
typing - an infinite number of monkeys typing into GNU emacs would never
make a good program).

So, you can either get rid of GNU emacs, or change it to use saner
values.  To do the latter, you can stick the following in your .emacs file:

--8<

Regarding tabs after the initial indentation, I've just done a quick
grep and around 14% of all defines under include/ uses that so I would
say that this is perfectly fine.

Fast typing with lazy eyes, that should read "around 71% of all defines".

Hmm, that's interesting. Is that in linux/include or amdgpu/include?

linux/include

I've been meaning to do my own extended regex to catch those, although
I'm using Emacs and pressing Tab key only indents and would not insert
a Tab char if already indented. (So applying this regex into the pre-commit
hook of all of my Git repos would never trigger.)

I remember of olden days, circa 2000 when I first got involved with Linux,
LKML didn't like loose tabs. Also lead kernel developers are using Emacs,
so it's been my choice of editor since circa 1994 (switched from vi to Emacs
largely due to the influence of a graphics prof I had in my seniour year of uni,
and part due to

Re: [PATCH] drm/amdkfd: Delete KFD_MQD_TYPE_COMPUTE

2019-11-20 Thread Felix Kuehling


n 2019-11-15 11:07, Yong Zhao wrote:

It is the same as KFD_MQD_TYPE_CP, so delete it. As a result, we will
have one less mqd mananger per device.

Change-Id: Iaa98fc17be06b216de7a826c3577f44bc0536b4c
Signed-off-by: Yong Zhao 


Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  | 3 +--
  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  | 1 -
  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   | 1 -
  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   | 3 +--
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 +--
  6 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index b42f34ef2b5c..f7f6df40875e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1595,7 +1595,7 @@ static int get_wave_state(struct device_queue_manager 
*dqm,
goto dqm_unlock;
}
  
-	mqd_mgr = dqm->mqd_mgrs[KFD_MQD_TYPE_COMPUTE];

+   mqd_mgr = dqm->mqd_mgrs[KFD_MQD_TYPE_CP];
  
  	if (!mqd_mgr->get_wave_state) {

r = -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
index 28876aceb14b..b08694ec65d7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
@@ -374,7 +374,6 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE 
type,
  
  	switch (type) {

case KFD_MQD_TYPE_CP:
-   case KFD_MQD_TYPE_COMPUTE:
mqd->allocate_mqd = allocate_mqd;
mqd->init_mqd = init_mqd;
mqd->free_mqd = free_mqd;
@@ -442,7 +441,7 @@ struct mqd_manager *mqd_manager_init_cik_hawaii(enum 
KFD_MQD_TYPE type,
mqd = mqd_manager_init_cik(type, dev);
if (!mqd)
return NULL;
-   if ((type == KFD_MQD_TYPE_CP) || (type == KFD_MQD_TYPE_COMPUTE))
+   if (type == KFD_MQD_TYPE_CP)
mqd->update_mqd = update_mqd_hawaii;
return mqd;
  }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
index 579c5ffcfa79..5a0e30441be8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
@@ -401,7 +401,6 @@ struct mqd_manager *mqd_manager_init_v10(enum KFD_MQD_TYPE 
type,
  
  	switch (type) {

case KFD_MQD_TYPE_CP:
-   case KFD_MQD_TYPE_COMPUTE:
pr_debug("%s@%i\n", __func__, __LINE__);
mqd->allocate_mqd = allocate_mqd;
mqd->init_mqd = init_mqd;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 22a819c888d8..bdbcea22ad12 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -444,7 +444,6 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE 
type,
  
  	switch (type) {

case KFD_MQD_TYPE_CP:
-   case KFD_MQD_TYPE_COMPUTE:
mqd->allocate_mqd = allocate_mqd;
mqd->init_mqd = init_mqd;
mqd->free_mqd = free_mqd;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
index 7d144f56f421..c9e1151b5a57 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
@@ -425,7 +425,6 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE 
type,
  
  	switch (type) {

case KFD_MQD_TYPE_CP:
-   case KFD_MQD_TYPE_COMPUTE:
mqd->allocate_mqd = allocate_mqd;
mqd->init_mqd = init_mqd;
mqd->free_mqd = free_mqd;
@@ -494,7 +493,7 @@ struct mqd_manager *mqd_manager_init_vi_tonga(enum 
KFD_MQD_TYPE type,
mqd = mqd_manager_init_vi(type, dev);
if (!mqd)
return NULL;
-   if ((type == KFD_MQD_TYPE_CP) || (type == KFD_MQD_TYPE_COMPUTE))
+   if (type == KFD_MQD_TYPE_CP)
mqd->update_mqd = update_mqd_tonga;
return mqd;
  }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 1049759dc6bb..514896bef99a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -508,8 +508,7 @@ struct queue {
   * Please read the kfd_mqd_manager.h description.
   */
  enum KFD_MQD_TYPE {
-   KFD_MQD_TYPE_COMPUTE = 0,   /* for no cp scheduling */
-   KFD_MQD_TYPE_HIQ,   /* for hiq */
+   KFD_MQD_TYPE_HIQ = 0,   /* for hiq */
KFD_MQD_TYPE_CP,/* for cp queues and diq */
KFD_MQD_TYPE_SDMA,  /* for sdma queues */
KFD_MQD_TYPE_DIQ,   /*

Re: [PATCH 01/12] amdgpu: add UAPI for creating encrypted buffers

2019-11-20 Thread Luben Tuikov

On 2019-11-20 12:24, Christian König wrote:
> Am 20.11.19 um 18:16 schrieb Christian König:
>> Am 20.11.19 um 17:49 schrieb Luben Tuikov:
>>> On 2019-11-19 21:41, Marek Olšák wrote:
 On Tue, Nov 19, 2019 at 8:52 PM Luben Tuikov >>> > wrote:

  On 2019-11-14 10:34 p.m., Aaron Liu wrote:
  > From: Huang Rui mailto:ray.hu...@amd.com>>
  >
  > To align the kernel uapi change from Alex:
  >
  > "Add a flag to the GEM_CREATE ioctl to create encrypted 
 buffers. Buffers with
  > this flag set will be created with the TMZ bit set in the 
 PTEs or engines
  > accessing them. This is required in order to properly access 
 the data from the
  > engines."
  >
  > We will use GEM_CREATE_ENCRYPTED flag for secure buffer 
 allocation.
  >
  > Signed-off-by: Huang Rui >>> >
  > Reviewed-by: Alex Deucher >>> >
  > ---
  >  include/drm/amdgpu_drm.h | 5 +
  >  1 file changed, 5 insertions(+)
  >
  > diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
  > index 5c28aa7..1a95e37 100644
  > --- a/include/drm/amdgpu_drm.h
  > +++ b/include/drm/amdgpu_drm.h
  > @@ -141,6 +141,11 @@ extern "C" {
  >   * releasing the memory
  >   */
  >  #define AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE  (1 << 9)
  > +/* Flag that BO will be encrypted and that the TMZ bit 
 should be
  > + * set in the PTEs when mapping this buffer via GPUVM or
  > + * accessing it with various hw blocks
  > + */
  > +#define AMDGPU_GEM_CREATE_ENCRYPTED          (1 << 10)

  Style!
  TAB char?!

  You have a TAB char between ".._ENCRYPTED" and "(1 << 10)"
  Do NOT add/insert TAB chars instead of space to align colunmns!
  If when you press Tab key a tab is inserted, as opposed to the 
 line
  indented, then DO NOT use this editor.
  The Tab key should "indent according to mode" by inserting TAB 
 chars.
  If the line is already indented, as this one is, then it should 
 do nothing.

 I disagree with this 100%. Tabs or spaces don't matter here from my 
 perspective. I also disagree with your language. It's overly impolite.
>>> But it's the coding style of Linux: leading tabs only. Try it with 
>>> Emacs as described and given in
>>>
>>> linux/Documentation/process/coding-style.rst
>>>
>>> starting at line 589. And press the Tab key on an already indented 
>>> line--nothing will happen. Linux has traditionally
>>> shunned from loose TAB chars in already indented lines: leading tabs 
>>> only mode. In a proper code editor
>>> pressing the Tab key only indents according to buffer mode, it 
>>> shouldn't insert a Tab char willy-nilly.
>>> People may set their tab stops differently for different tab 
>>> positions and inserting a tab char may display
>>> incorrectly. The most portable way to align columns in an already 
>>> indented-according-to-mode line, is
>>> using spaces. (Of course this doesn't matter when using spaces to 
>>> indent, but Linux uses hard TAB chars
>>> to indent: linux/Documentation/process/coding-style.rst. (which also 
>>> seem to be set to 8 chars))
>>>
>>> It's a code review, there is no "language".
>>
>> Well the section you noted also suggest to either get rid of emacs or 
>> change it to use some saner default values. We just got rid of emacs.

Yes, it says this, quote (for those who didn't open the file):

--8<-

That's OK, we all do.  You've probably been told by your long-time Unix
user helper that ``GNU emacs`` automatically formats the C sources for
you, and you've noticed that yes, it does do that, but the defaults it
uses are less than desirable (in fact, they are worse than random
typing - an infinite number of monkeys typing into GNU emacs would never
make a good program).

So, you can either get rid of GNU emacs, or change it to use saner
values.  To do the latter, you can stick the following in your .emacs file:

--8<

>>
>> Regarding tabs after the initial indentation, I've just done a quick 
>> grep and around 14% of all defines under include/ uses that so I would 
>> say that this is perfectly fine.
> 
> Fast typing with lazy eyes, that should read "around 71% of all defines".

Hmm, that's interesting. Is that in linux/include or amdgpu/include?

I've been meaning to do my own extended regex to catch those, although
I'm using Emacs and pressing Tab key only indents and would not insert
a Tab char if already indented. (So applying this regex into the pre-commit
hook of all of my Git repos would never trigger.)

I remember

Re: [PATCH 01/12] amdgpu: add UAPI for creating encrypted buffers

2019-11-20 Thread Harry Wentland

On 2019-11-20 12:05 p.m., Harry Wentland wrote:
> On 2019-11-20 11:49 a.m., Luben Tuikov wrote:
>> On 2019-11-19 21:41, Marek Olšák wrote:
>>> On Tue, Nov 19, 2019 at 8:52 PM Luben Tuikov >> > wrote:
>>>
>>> On 2019-11-14 10:34 p.m., Aaron Liu wrote:
>>> > From: Huang Rui mailto:ray.hu...@amd.com>>
>>> >
>>> > To align the kernel uapi change from Alex:
>>> >
>>> > "Add a flag to the GEM_CREATE ioctl to create encrypted buffers. 
>>> Buffers with
>>> > this flag set will be created with the TMZ bit set in the PTEs or 
>>> engines
>>> > accessing them. This is required in order to properly access the data 
>>> from the
>>> > engines."
>>> >
>>> > We will use GEM_CREATE_ENCRYPTED flag for secure buffer allocation.
>>> >
>>> > Signed-off-by: Huang Rui >> >
>>> > Reviewed-by: Alex Deucher >> >
>>> > ---
>>> >  include/drm/amdgpu_drm.h | 5 +
>>> >  1 file changed, 5 insertions(+)
>>> >
>>> > diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
>>> > index 5c28aa7..1a95e37 100644
>>> > --- a/include/drm/amdgpu_drm.h
>>> > +++ b/include/drm/amdgpu_drm.h
>>> > @@ -141,6 +141,11 @@ extern "C" {
>>> >   * releasing the memory
>>> >   */
>>> >  #define AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE       (1 << 9)
>>> > +/* Flag that BO will be encrypted and that the TMZ bit should be
>>> > + * set in the PTEs when mapping this buffer via GPUVM or
>>> > + * accessing it with various hw blocks
>>> > + */
>>> > +#define AMDGPU_GEM_CREATE_ENCRYPTED          (1 << 10)
>>>
>>> Style!
>>> TAB char?!
>>>
>>> You have a TAB char between ".._ENCRYPTED" and "(1 << 10)"
>>> Do NOT add/insert TAB chars instead of space to align colunmns!
>>> If when you press Tab key a tab is inserted, as opposed to the line
>>> indented, then DO NOT use this editor.
>>> The Tab key should "indent according to mode" by inserting TAB chars.
>>> If the line is already indented, as this one is, then it should do 
>>> nothing.
>>>
>>>
>>> I disagree with this 100%. Tabs or spaces don't matter here from my 
>>> perspective. I also disagree with your language. It's overly impolite.
>>
>> But it's the coding style of Linux: leading tabs only. Try it with Emacs as 
>> described and given in
>>
>> linux/Documentation/process/coding-style.rst
>>
>> starting at line 589. And press the Tab key on an already indented 
>> line--nothing will happen. Linux has traditionally
>> shunned from loose TAB chars in already indented lines: leading tabs only 
>> mode. In a proper code editor
>> pressing the Tab key only indents according to buffer mode, it shouldn't 
>> insert a Tab char willy-nilly.
>> People may set their tab stops differently for different tab positions and 
>> inserting a tab char may display
>> incorrectly. The most portable way to align columns in an already 
>> indented-according-to-mode line, is
>> using spaces. (Of course this doesn't matter when using spaces to indent, 
>> but Linux uses hard TAB chars
>> to indent: linux/Documentation/process/coding-style.rst. (which also seem to 
>> be set to 8 chars))
>>
>> It's a code review, there is no "language".
> 
> May I remind you that freedesktop.org hosted projects follow a code of
> conduct [1]. This applies whether the interaction is a code review or
> any other interaction.
> 
> I don't think your language was overly impolite but it did come across a
> bit strong. Please consider how your statements might be perceived by

Just to clarify, and I did intend to highlight this in the previous
sentence, I don't think there is any violation of the CoC here. I am
merely trying to say that language matters, even for code reviews.

Harry

> the person they're addressed to.
> 
> [1] https://www.freedesktop.org/wiki/CodeOfConduct/
> 
> Harry
> 
>>
>> Regards,
>> Luben
>>
>>>
>>> Marek
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 01/12] amdgpu: add UAPI for creating encrypted buffers

2019-11-20 Thread Christian König

Am 20.11.19 um 18:16 schrieb Christian König:

Am 20.11.19 um 17:49 schrieb Luben Tuikov:

On 2019-11-19 21:41, Marek Olšák wrote:
On Tue, Nov 19, 2019 at 8:52 PM Luben Tuikov > wrote:

 On 2019-11-14 10:34 p.m., Aaron Liu wrote:
 > From: Huang Rui mailto:ray.hu...@amd.com>>
 >
 > To align the kernel uapi change from Alex:
 >
 > "Add a flag to the GEM_CREATE ioctl to create encrypted 
buffers. Buffers with
 > this flag set will be created with the TMZ bit set in the 
PTEs or engines
 > accessing them. This is required in order to properly access 
the data from the

 > engines."
 >
 > We will use GEM_CREATE_ENCRYPTED flag for secure buffer 
allocation.

 >
 > Signed-off-by: Huang Rui >
 > Reviewed-by: Alex Deucher >

 > ---
 >  include/drm/amdgpu_drm.h | 5 +
 >  1 file changed, 5 insertions(+)
 >
 > diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
 > index 5c28aa7..1a95e37 100644
 > --- a/include/drm/amdgpu_drm.h
 > +++ b/include/drm/amdgpu_drm.h
 > @@ -141,6 +141,11 @@ extern "C" {
 >   * releasing the memory
 >   */
 >  #define AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE  (1 << 9)
 > +/* Flag that BO will be encrypted and that the TMZ bit 
should be

 > + * set in the PTEs when mapping this buffer via GPUVM or
 > + * accessing it with various hw blocks
 > + */
 > +#define AMDGPU_GEM_CREATE_ENCRYPTED          (1 << 10)

 Style!
 TAB char?!

 You have a TAB char between ".._ENCRYPTED" and "(1 << 10)"
 Do NOT add/insert TAB chars instead of space to align colunmns!
 If when you press Tab key a tab is inserted, as opposed to the 
line

 indented, then DO NOT use this editor.
 The Tab key should "indent according to mode" by inserting TAB 
chars.
 If the line is already indented, as this one is, then it should 
do nothing.

I disagree with this 100%. Tabs or spaces don't matter here from my 
perspective. I also disagree with your language. It's overly impolite.
But it's the coding style of Linux: leading tabs only. Try it with 
Emacs as described and given in

linux/Documentation/process/coding-style.rst

starting at line 589. And press the Tab key on an already indented 
line--nothing will happen. Linux has traditionally
shunned from loose TAB chars in already indented lines: leading tabs 
only mode. In a proper code editor
pressing the Tab key only indents according to buffer mode, it 
shouldn't insert a Tab char willy-nilly.
People may set their tab stops differently for different tab 
positions and inserting a tab char may display
incorrectly. The most portable way to align columns in an already 
indented-according-to-mode line, is
using spaces. (Of course this doesn't matter when using spaces to 
indent, but Linux uses hard TAB chars
to indent: linux/Documentation/process/coding-style.rst. (which also 
seem to be set to 8 chars))

It's a code review, there is no "language".

Well the section you noted also suggest to either get rid of emacs or 
change it to use some saner default values. We just got rid of emacs.

Regarding tabs after the initial indentation, I've just done a quick 
grep and around 14% of all defines under include/ uses that so I would 
say that this is perfectly fine.

Fast typing with lazy eyes, that should read "around 71% of all defines".

Sorry,
Christian.

Regards,
Christian.

Regards,
Luben

Marek

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Christian König

The hardware doesn't support inter engine interrupts and as far as I 
know there are no plans for this.


Polling is perfectly fine in this case, it just doesn't interacts well 
with power gating.


Christian.

Am 20.11.19 um 16:38 schrieb Zeng, Oak:

Thank you Christian. Maybe in the future we can make the invalidation ack to be 
interrupt based instead of polling.

Regards,
Oak

-Original Message-
From: Koenig, Christian 
Sent: Wednesday, November 20, 2019 10:14 AM
To: Zeng, Oak ; Liu, Monk ; Zhu, Changfeng ; 
Xiao, Jack ; Zhou1, Tao ; Huang, Ray ; Huang, 
Shimmer ; amd-gfx@lists.freedesktop.org
Subject: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround 
in amdgpu_virt

Hi Oak,


[Oak] I am not familiar about the power gating sequence but from first glance, 
should the power gating sequence make sure that HW is ready (idle) for power 
gating before put the system to power gating?

The problem is that the hardware is actually idle when gated.

See what happens is the following:

1. Ring A sends an invalidate command to VM invalidation engine X.

2. VM invalidation engine X wakes up and is ungated because it now has work.

3. VM invalidation engine X finishes the invalidation and goes back to be gated 
again.

4. Now ring A polls for the invalidation on engine X to complete, but since it 
got back to be gated again it has forgotten that we have finished that 
invalidation. BAM! Ring A will poll forever.

Regards,
Christian.

Am 20.11.19 um 16:04 schrieb Zeng, Oak:

See an inline comment

Regards,
Oak

-Original Message-
From: amd-gfx  On Behalf Of
Christian König
Sent: Wednesday, November 20, 2019 8:21 AM
To: Liu, Monk ; Zhu, Changfeng
; Xiao, Jack ; Zhou1, Tao
; Huang, Ray ; Huang, Shimmer
; amd-gfx@lists.freedesktop.org
Subject: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore
workaround in amdgpu_virt

Hi Monk,

this is a fix for power gating the MMHUB.

Basic problem is that the MMHUB can power gate while an invalidation
is in progress [Oak] I am not familiar about the power gating sequence but from 
first glance, should the power gating sequence make sure that HW is ready 
(idle) for power gating before put the system to power gating? E.g., before we 
put the system to power gating, should we enquiry each HW blocks to see whether 
the HW is idle? If not (like the case you mentioned some invalidation 
activities is still ongoing) the power gating condition is not mature and we 
should we wait. Or if the power gating is trigger/initiated by HW (I am not 
sure), HW should guarantee it is idle?

   which looses all bits in the ACK register and so deadlocks the engine 
waiting for the invalidation to finish.

This bug is hit immediately when we enable power gating of the MMHUB.

Regards,
Christian.

Am 20.11.19 um 14:18 schrieb Liu, Monk:

Hi Changfeng

Firs of all, there is no power-gating off circle involved in AMDGPU
SRIOV, since we don't allow VF/VM do such things so I do feel strange
why you post something like this Especially on VEGA10 serials which
looks doesn't have any issue on those gpu_flush part

Here is my questions for you:
1) Can you point me what issue had you been experienced ? and how to
repro the bug
2) if you do hit some issues, did you verified that your patch can fix it ?

besides

/Monk

-邮件原件-
发件人: amd-gfx  代表 Changfeng.Zhu
发送时间: 2019年11月20日 17:14
收件人: Koenig, Christian ; Xiao, Jack
; Zhou1, Tao ; Huang, Ray
; Huang, Shimmer ;
amd-gfx@lists.freedesktop.org
抄送: Zhu, Changfeng 
主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in
amdgpu_virt

From: changzhu 

It may lose gpuvm invalidate acknowldege state across power-gating off cycle. 
To avoid this issue in virt invalidation, add semaphore acquire before 
invalidation and semaphore release after invalidation.

Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
Signed-off-by: changzhu 
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--  
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f04eb1a64271..70ffaf91cd12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device
*adev, uint32_t reg, uint32_t v)

void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,

uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
{
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,

Re: [PATCH 01/12] amdgpu: add UAPI for creating encrypted buffers

2019-11-20 Thread Christian König

Am 20.11.19 um 17:49 schrieb Luben Tuikov:

On 2019-11-19 21:41, Marek Olšák wrote:

On Tue, Nov 19, 2019 at 8:52 PM Luben Tuikov mailto:luben.tui...@amd.com>> wrote:

 On 2019-11-14 10:34 p.m., Aaron Liu wrote:
 > From: Huang Rui mailto:ray.hu...@amd.com>>
 >
 > To align the kernel uapi change from Alex:
 >
 > "Add a flag to the GEM_CREATE ioctl to create encrypted buffers. Buffers 
with
 > this flag set will be created with the TMZ bit set in the PTEs or engines
 > accessing them. This is required in order to properly access the data 
from the
 > engines."
 >
 > We will use GEM_CREATE_ENCRYPTED flag for secure buffer allocation.
 >
 > Signed-off-by: Huang Rui mailto:ray.hu...@amd.com>>
 > Reviewed-by: Alex Deucher mailto:alexander.deuc...@amd.com>>
 > ---
 >  include/drm/amdgpu_drm.h | 5 +
 >  1 file changed, 5 insertions(+)
 >
 > diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
 > index 5c28aa7..1a95e37 100644
 > --- a/include/drm/amdgpu_drm.h
 > +++ b/include/drm/amdgpu_drm.h
 > @@ -141,6 +141,11 @@ extern "C" {
 >   * releasing the memory
 >   */
 >  #define AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE       (1 << 9)
 > +/* Flag that BO will be encrypted and that the TMZ bit should be
 > + * set in the PTEs when mapping this buffer via GPUVM or
 > + * accessing it with various hw blocks
 > + */
 > +#define AMDGPU_GEM_CREATE_ENCRYPTED          (1 << 10)

 Style!
 TAB char?!

 You have a TAB char between ".._ENCRYPTED" and "(1 << 10)"
 Do NOT add/insert TAB chars instead of space to align colunmns!
 If when you press Tab key a tab is inserted, as opposed to the line
 indented, then DO NOT use this editor.
 The Tab key should "indent according to mode" by inserting TAB chars.
 If the line is already indented, as this one is, then it should do nothing.

I disagree with this 100%. Tabs or spaces don't matter here from my 
perspective. I also disagree with your language. It's overly impolite.

But it's the coding style of Linux: leading tabs only. Try it with Emacs as 
described and given in

linux/Documentation/process/coding-style.rst

starting at line 589. And press the Tab key on an already indented 
line--nothing will happen. Linux has traditionally
shunned from loose TAB chars in already indented lines: leading tabs only mode. 
In a proper code editor
pressing the Tab key only indents according to buffer mode, it shouldn't insert 
a Tab char willy-nilly.
People may set their tab stops differently for different tab positions and 
inserting a tab char may display
incorrectly. The most portable way to align columns in an already 
indented-according-to-mode line, is
using spaces. (Of course this doesn't matter when using spaces to indent, but 
Linux uses hard TAB chars
to indent: linux/Documentation/process/coding-style.rst. (which also seem to be 
set to 8 chars))

It's a code review, there is no "language".

Well the section you noted also suggest to either get rid of emacs or 
change it to use some saner default values. We just got rid of emacs.

Regarding tabs after the initial indentation, I've just done a quick 
grep and around 14% of all defines under include/ uses that so I would 
say that this is perfectly fine.

Regards,
Christian.

Regards,
Luben

Marek

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 01/12] amdgpu: add UAPI for creating encrypted buffers

2019-11-20 Thread Harry Wentland

On 2019-11-20 11:49 a.m., Luben Tuikov wrote:
> On 2019-11-19 21:41, Marek Olšák wrote:
>> On Tue, Nov 19, 2019 at 8:52 PM Luben Tuikov > > wrote:
>>
>> On 2019-11-14 10:34 p.m., Aaron Liu wrote:
>> > From: Huang Rui mailto:ray.hu...@amd.com>>
>> >
>> > To align the kernel uapi change from Alex:
>> >
>> > "Add a flag to the GEM_CREATE ioctl to create encrypted buffers. 
>> Buffers with
>> > this flag set will be created with the TMZ bit set in the PTEs or 
>> engines
>> > accessing them. This is required in order to properly access the data 
>> from the
>> > engines."
>> >
>> > We will use GEM_CREATE_ENCRYPTED flag for secure buffer allocation.
>> >
>> > Signed-off-by: Huang Rui mailto:ray.hu...@amd.com>>
>> > Reviewed-by: Alex Deucher > >
>> > ---
>> >  include/drm/amdgpu_drm.h | 5 +
>> >  1 file changed, 5 insertions(+)
>> >
>> > diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
>> > index 5c28aa7..1a95e37 100644
>> > --- a/include/drm/amdgpu_drm.h
>> > +++ b/include/drm/amdgpu_drm.h
>> > @@ -141,6 +141,11 @@ extern "C" {
>> >   * releasing the memory
>> >   */
>> >  #define AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE       (1 << 9)
>> > +/* Flag that BO will be encrypted and that the TMZ bit should be
>> > + * set in the PTEs when mapping this buffer via GPUVM or
>> > + * accessing it with various hw blocks
>> > + */
>> > +#define AMDGPU_GEM_CREATE_ENCRYPTED          (1 << 10)
>>
>> Style!
>> TAB char?!
>>
>> You have a TAB char between ".._ENCRYPTED" and "(1 << 10)"
>> Do NOT add/insert TAB chars instead of space to align colunmns!
>> If when you press Tab key a tab is inserted, as opposed to the line
>> indented, then DO NOT use this editor.
>> The Tab key should "indent according to mode" by inserting TAB chars.
>> If the line is already indented, as this one is, then it should do 
>> nothing.
>>
>>
>> I disagree with this 100%. Tabs or spaces don't matter here from my 
>> perspective. I also disagree with your language. It's overly impolite.
> 
> But it's the coding style of Linux: leading tabs only. Try it with Emacs as 
> described and given in
> 
> linux/Documentation/process/coding-style.rst
> 
> starting at line 589. And press the Tab key on an already indented 
> line--nothing will happen. Linux has traditionally
> shunned from loose TAB chars in already indented lines: leading tabs only 
> mode. In a proper code editor
> pressing the Tab key only indents according to buffer mode, it shouldn't 
> insert a Tab char willy-nilly.
> People may set their tab stops differently for different tab positions and 
> inserting a tab char may display
> incorrectly. The most portable way to align columns in an already 
> indented-according-to-mode line, is
> using spaces. (Of course this doesn't matter when using spaces to indent, but 
> Linux uses hard TAB chars
> to indent: linux/Documentation/process/coding-style.rst. (which also seem to 
> be set to 8 chars))
> 
> It's a code review, there is no "language".

May I remind you that freedesktop.org hosted projects follow a code of
conduct [1]. This applies whether the interaction is a code review or
any other interaction.

I don't think your language was overly impolite but it did come across a
bit strong. Please consider how your statements might be perceived by
the person they're addressed to.

[1] https://www.freedesktop.org/wiki/CodeOfConduct/

Harry

> 
> Regards,
> Luben
> 
>>
>> Marek
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

AMD e9171 linux driver

2019-11-20 Thread Yusuf Altıparmak

Hello,

I have simple question. I want to use e9171 with my embedded t1042d4rdb-64b
board. Does linux kernel provide a driver for e9171? More specific, does
'amdgpu'(https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/amd)
provides driver for e9171 ?

Best regards
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 01/12] amdgpu: add UAPI for creating encrypted buffers

2019-11-20 Thread Luben Tuikov

On 2019-11-19 21:41, Marek Olšák wrote:
> On Tue, Nov 19, 2019 at 8:52 PM Luben Tuikov  > wrote:
> 
> On 2019-11-14 10:34 p.m., Aaron Liu wrote:
> > From: Huang Rui mailto:ray.hu...@amd.com>>
> >
> > To align the kernel uapi change from Alex:
> >
> > "Add a flag to the GEM_CREATE ioctl to create encrypted buffers. 
> Buffers with
> > this flag set will be created with the TMZ bit set in the PTEs or 
> engines
> > accessing them. This is required in order to properly access the data 
> from the
> > engines."
> >
> > We will use GEM_CREATE_ENCRYPTED flag for secure buffer allocation.
> >
> > Signed-off-by: Huang Rui mailto:ray.hu...@amd.com>>
> > Reviewed-by: Alex Deucher  >
> > ---
> >  include/drm/amdgpu_drm.h | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
> > index 5c28aa7..1a95e37 100644
> > --- a/include/drm/amdgpu_drm.h
> > +++ b/include/drm/amdgpu_drm.h
> > @@ -141,6 +141,11 @@ extern "C" {
> >   * releasing the memory
> >   */
> >  #define AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE       (1 << 9)
> > +/* Flag that BO will be encrypted and that the TMZ bit should be
> > + * set in the PTEs when mapping this buffer via GPUVM or
> > + * accessing it with various hw blocks
> > + */
> > +#define AMDGPU_GEM_CREATE_ENCRYPTED          (1 << 10)
> 
> Style!
> TAB char?!
> 
> You have a TAB char between ".._ENCRYPTED" and "(1 << 10)"
> Do NOT add/insert TAB chars instead of space to align colunmns!
> If when you press Tab key a tab is inserted, as opposed to the line
> indented, then DO NOT use this editor.
> The Tab key should "indent according to mode" by inserting TAB chars.
> If the line is already indented, as this one is, then it should do 
> nothing.
> 
> 
> I disagree with this 100%. Tabs or spaces don't matter here from my 
> perspective. I also disagree with your language. It's overly impolite.

But it's the coding style of Linux: leading tabs only. Try it with Emacs as 
described and given in

linux/Documentation/process/coding-style.rst

starting at line 589. And press the Tab key on an already indented 
line--nothing will happen. Linux has traditionally
shunned from loose TAB chars in already indented lines: leading tabs only mode. 
In a proper code editor
pressing the Tab key only indents according to buffer mode, it shouldn't insert 
a Tab char willy-nilly.
People may set their tab stops differently for different tab positions and 
inserting a tab char may display
incorrectly. The most portable way to align columns in an already 
indented-according-to-mode line, is
using spaces. (Of course this doesn't matter when using spaces to indent, but 
Linux uses hard TAB chars
to indent: linux/Documentation/process/coding-style.rst. (which also seem to be 
set to 8 chars))

It's a code review, there is no "language".

Regards,
Luben

> 
> Marek

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Update Arcturus golden registers

2019-11-20 Thread Deucher, Alexander

Acked-by: Alex Deucher 

From: amd-gfx  on behalf of Jay Cornwall 

Sent: Wednesday, November 20, 2019 11:38 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Cornwall, Jay 
Subject: [PATCH] drm/amdgpu: Update Arcturus golden registers

Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 8073fcd..9f90448 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -692,6 +692,7 @@ static const struct soc15_reg_golden 
golden_settings_gc_9_4_1_arct[] =
 SOC15_REG_GOLDEN_VALUE(GC, 0, mmTCP_CHAN_STEER_4_ARCT, 0x3fff, 
0xb90f5b1),
 SOC15_REG_GOLDEN_VALUE(GC, 0, mmTCP_CHAN_STEER_5_ARCT, 0x3ff, 0x135),
 SOC15_REG_GOLDEN_VALUE(GC, 0, mmSQ_CONFIG, 0x, 0x011A),
+   SOC15_REG_GOLDEN_VALUE(GC, 0, mmSQ_FIFO_SIZES, 0x, 0x0f00),
 };

 static const u32 GFX_RLC_SRM_INDEX_CNTL_ADDR_OFFSETS[] =
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: Update Arcturus golden registers

2019-11-20 Thread Jay Cornwall

Signed-off-by: Jay Cornwall 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 8073fcd..9f90448 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -692,6 +692,7 @@ static const struct soc15_reg_golden 
golden_settings_gc_9_4_1_arct[] =
SOC15_REG_GOLDEN_VALUE(GC, 0, mmTCP_CHAN_STEER_4_ARCT, 0x3fff, 
0xb90f5b1),
SOC15_REG_GOLDEN_VALUE(GC, 0, mmTCP_CHAN_STEER_5_ARCT, 0x3ff, 0x135),
SOC15_REG_GOLDEN_VALUE(GC, 0, mmSQ_CONFIG, 0x, 0x011A),
+   SOC15_REG_GOLDEN_VALUE(GC, 0, mmSQ_FIFO_SIZES, 0x, 0x0f00),
 };
 
 static const u32 GFX_RLC_SRM_INDEX_CNTL_ADDR_OFFSETS[] =
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Zeng, Oak

Thank you Christian. Maybe in the future we can make the invalidation ack to be 
interrupt based instead of polling.

Regards,
Oak

-Original Message-
From: Koenig, Christian  
Sent: Wednesday, November 20, 2019 10:14 AM
To: Zeng, Oak ; Liu, Monk ; Zhu, Changfeng 
; Xiao, Jack ; Zhou1, Tao 
; Huang, Ray ; Huang, Shimmer 
; amd-gfx@lists.freedesktop.org
Subject: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround 
in amdgpu_virt

Hi Oak,

> [Oak] I am not familiar about the power gating sequence but from first 
> glance, should the power gating sequence make sure that HW is ready (idle) 
> for power gating before put the system to power gating?
The problem is that the hardware is actually idle when gated.

See what happens is the following:

1. Ring A sends an invalidate command to VM invalidation engine X.

2. VM invalidation engine X wakes up and is ungated because it now has work.

3. VM invalidation engine X finishes the invalidation and goes back to be gated 
again.

4. Now ring A polls for the invalidation on engine X to complete, but since it 
got back to be gated again it has forgotten that we have finished that 
invalidation. BAM! Ring A will poll forever.

Regards,
Christian.

Am 20.11.19 um 16:04 schrieb Zeng, Oak:
> See an inline comment
>
> Regards,
> Oak
>
> -Original Message-
> From: amd-gfx  On Behalf Of 
> Christian König
> Sent: Wednesday, November 20, 2019 8:21 AM
> To: Liu, Monk ; Zhu, Changfeng 
> ; Xiao, Jack ; Zhou1, Tao 
> ; Huang, Ray ; Huang, Shimmer 
> ; amd-gfx@lists.freedesktop.org
> Subject: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
> workaround in amdgpu_virt
>
> Hi Monk,
>
> this is a fix for power gating the MMHUB.
>
> Basic problem is that the MMHUB can power gate while an invalidation 
> is in progress [Oak] I am not familiar about the power gating sequence but 
> from first glance, should the power gating sequence make sure that HW is 
> ready (idle) for power gating before put the system to power gating? E.g., 
> before we put the system to power gating, should we enquiry each HW blocks to 
> see whether the HW is idle? If not (like the case you mentioned some 
> invalidation activities is still ongoing) the power gating condition is not 
> mature and we should we wait. Or if the power gating is trigger/initiated by 
> HW (I am not sure), HW should guarantee it is idle?
>
>   which looses all bits in the ACK register and so deadlocks the engine 
> waiting for the invalidation to finish.
>
> This bug is hit immediately when we enable power gating of the MMHUB.
>
> Regards,
> Christian.
>
> Am 20.11.19 um 14:18 schrieb Liu, Monk:
>> Hi Changfeng
>>
>> Firs of all, there is no power-gating off circle involved in AMDGPU 
>> SRIOV, since we don't allow VF/VM do such things so I do feel strange 
>> why you post something like this Especially on VEGA10 serials which 
>> looks doesn't have any issue on those gpu_flush part
>>
>> Here is my questions for you:
>> 1) Can you point me what issue had you been experienced ? and how to 
>> repro the bug
>> 2) if you do hit some issues, did you verified that your patch can fix it ?
>>
>> besides
>>
>> /Monk
>>
>> -邮件原件-
>> 发件人: amd-gfx  代表 Changfeng.Zhu
>> 发送时间: 2019年11月20日 17:14
>> 收件人: Koenig, Christian ; Xiao, Jack 
>> ; Zhou1, Tao ; Huang, Ray 
>> ; Huang, Shimmer ; 
>> amd-gfx@lists.freedesktop.org
>> 抄送: Zhu, Changfeng 
>> 主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
>> amdgpu_virt
>>
>> From: changzhu 
>>
>> It may lose gpuvm invalidate acknowldege state across power-gating off 
>> cycle. To avoid this issue in virt invalidation, add semaphore acquire 
>> before invalidation and semaphore release after invalidation.
>>
>> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
>> Signed-off-by: changzhu 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--  
>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
>>drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
>>3 files changed, 28 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> index f04eb1a64271..70ffaf91cd12 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> @@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device 
>> *adev, uint32_t reg, uint32_t v)
>>
>>void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>  uint32_t reg0, uint32_t reg1,
>> -uint32_t ref, uint32_t mask)
>> +uint32_t ref, uint32_t mask,
>> +uint32_t sem)
>>{
>>  struct amdgpu_kiq *kiq = >gfx.kiq;
>>  struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
>> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>  uint32_t seq;
>>
>>

Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Christian König


Hi Oak,


[Oak] I am not familiar about the power gating sequence but from first glance, 
should the power gating sequence make sure that HW is ready (idle) for power 
gating before put the system to power gating?

The problem is that the hardware is actually idle when gated.

See what happens is the following:

1. Ring A sends an invalidate command to VM invalidation engine X.

2. VM invalidation engine X wakes up and is ungated because it now has work.

3. VM invalidation engine X finishes the invalidation and goes back to 
be gated again.


4. Now ring A polls for the invalidation on engine X to complete, but 
since it got back to be gated again it has forgotten that we have 
finished that invalidation. BAM! Ring A will poll forever.


Regards,
Christian.

Am 20.11.19 um 16:04 schrieb Zeng, Oak:

See an inline comment

Regards,
Oak

-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Wednesday, November 20, 2019 8:21 AM
To: Liu, Monk ; Zhu, Changfeng ; Xiao, Jack 
; Zhou1, Tao ; Huang, Ray ; Huang, 
Shimmer ; amd-gfx@lists.freedesktop.org
Subject: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround 
in amdgpu_virt

Hi Monk,

this is a fix for power gating the MMHUB.

Basic problem is that the MMHUB can power gate while an invalidation is in 
progress
[Oak] I am not familiar about the power gating sequence but from first glance, 
should the power gating sequence make sure that HW is ready (idle) for power 
gating before put the system to power gating? E.g., before we put the system to 
power gating, should we enquiry each HW blocks to see whether the HW is idle? 
If not (like the case you mentioned some invalidation activities is still 
ongoing) the power gating condition is not mature and we should we wait. Or if 
the power gating is trigger/initiated by HW (I am not sure), HW should 
guarantee it is idle?

  which looses all bits in the ACK register and so deadlocks the engine waiting 
for the invalidation to finish.

This bug is hit immediately when we enable power gating of the MMHUB.

Regards,
Christian.

Am 20.11.19 um 14:18 schrieb Liu, Monk:

Hi Changfeng

Firs of all, there is no power-gating off circle involved in AMDGPU
SRIOV, since we don't allow VF/VM do such things so I do feel strange
why you post something like this Especially on VEGA10 serials which
looks doesn't have any issue on those gpu_flush part

Here is my questions for you:
1) Can you point me what issue had you been experienced ? and how to
repro the bug
2) if you do hit some issues, did you verified that your patch can fix it ?

besides

/Monk

-邮件原件-
发件人: amd-gfx  代表 Changfeng.Zhu
发送时间: 2019年11月20日 17:14
收件人: Koenig, Christian ; Xiao, Jack
; Zhou1, Tao ; Huang, Ray
; Huang, Shimmer ;
amd-gfx@lists.freedesktop.org
抄送: Zhu, Changfeng 
主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in
amdgpu_virt

From: changzhu 

It may lose gpuvm invalidate acknowldege state across power-gating off cycle. 
To avoid this issue in virt invalidation, add semaphore acquire before 
invalidation and semaphore release after invalidation.

Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
Signed-off-by: changzhu 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--  
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
   3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f04eb1a64271..70ffaf91cd12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device
*adev, uint32_t reg, uint32_t v)
   
   void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,

uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
   {
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t seq;
   
   	spin_lock_irqsave(>ring_lock, flags);

-   amdgpu_ring_alloc(ring, 32);
+   amdgpu_ring_alloc(ring, 60);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
+
amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,

[PATCH] drm/amd/display: make various variable in fixed31_32.h 'global' instead of 'static'

2019-11-20 Thread yu kuai

fixed31_32.h declare various variables 'static const', it's very ugly and
waste of memory.

All files that including the header file will have a copy of those
variables of their own. And that's the reason why there will be numerous
gcc '-Wunused-but-set-variable' warnings related to the variables.

Fix it by initializing the variables in a new file "fixed31_32.c", and
declare them 'extern' in "fixed31_32.h".

Fixes: eb0e515464e4 ("drm/amd/display: get rid of 32.32 unsigned fixed point")
Signed-off-by: yu kuai 
---

BTW, this is the best I can think of, there may be better sulotion.

 drivers/gpu/drm/amd/display/amdgpu_dm/Makefile  |  2 +-
 .../gpu/drm/amd/display/amdgpu_dm/fixed31_32.c  | 17 +
 .../gpu/drm/amd/display/include/fixed31_32.h| 16 
 3 files changed, 26 insertions(+), 9 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/fixed31_32.c

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/Makefile 
b/drivers/gpu/drm/amd/display/amdgpu_dm/Makefile
index 9a3b7bf8ab0b..8ce291a0279b 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/Makefile
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/Makefile
@@ -25,7 +25,7 @@
 
 
 
-AMDGPUDM = amdgpu_dm.o amdgpu_dm_irq.o amdgpu_dm_mst_types.o amdgpu_dm_color.o
+AMDGPUDM = amdgpu_dm.o amdgpu_dm_irq.o amdgpu_dm_mst_types.o amdgpu_dm_color.o 
fixed31_32.o
 
 ifneq ($(CONFIG_DRM_AMD_DC),)
 AMDGPUDM += amdgpu_dm_services.o amdgpu_dm_helpers.o amdgpu_dm_pp_smu.o
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/fixed31_32.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/fixed31_32.c
new file mode 100644
index ..1f51587e342b
--- /dev/null
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/fixed31_32.c
@@ -0,0 +1,17 @@
+/*
+ * Author: yu kuai 
+ */
+
+struct fixed31_32 {
+   long long value;
+};
+
+const struct fixed31_32 dc_fixpt_zero = { 0 };
+const struct fixed31_32 dc_fixpt_epsilon = { 1LL };
+const struct fixed31_32 dc_fixpt_half = { 0x8000LL };
+const struct fixed31_32 dc_fixpt_one = { 0x1LL };
+
+const struct fixed31_32 dc_fixpt_two_pi = { 26986075409LL };
+const struct fixed31_32 dc_fixpt_ln2 = { 2977044471LL };
+const struct fixed31_32 dc_fixpt_ln2_div_2 = { 1488522236LL };
+
diff --git a/drivers/gpu/drm/amd/display/include/fixed31_32.h 
b/drivers/gpu/drm/amd/display/include/fixed31_32.h
index 291215362e3f..d8dbe96f9b19 100644
--- a/drivers/gpu/drm/amd/display/include/fixed31_32.h
+++ b/drivers/gpu/drm/amd/display/include/fixed31_32.h
@@ -64,14 +64,14 @@ struct fixed31_32 {
  * Useful constants
  */
 
-static const struct fixed31_32 dc_fixpt_zero = { 0 };
-static const struct fixed31_32 dc_fixpt_epsilon = { 1LL };
-static const struct fixed31_32 dc_fixpt_half = { 0x8000LL };
-static const struct fixed31_32 dc_fixpt_one = { 0x1LL };
-
-static const struct fixed31_32 dc_fixpt_two_pi = { 26986075409LL };
-static const struct fixed31_32 dc_fixpt_ln2 = { 2977044471LL };
-static const struct fixed31_32 dc_fixpt_ln2_div_2 = { 1488522236LL };
+extern const struct fixed31_32 dc_fixpt_zero;
+extern const struct fixed31_32 dc_fixpt_epsilon;
+extern const struct fixed31_32 dc_fixpt_half;
+extern const struct fixed31_32 dc_fixpt_one;
+
+extern const struct fixed31_32 dc_fixpt_two_pi;
+extern const struct fixed31_32 dc_fixpt_ln2;
+extern const struct fixed31_32 dc_fixpt_ln2_div_2;
 
 /*
  * @brief
-- 
2.17.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 0/2] fix inappropriate use of declaring variable 'static' in fixed31_32.h

2019-11-20 Thread yu kuai

The first patch remove two set but not used variable.

The second patch make the variables in fixed31_32.h 'global'
instead of 'static'.

yu kuai (2):
  drm/amd/display: remove set but not used variable 'dc_fixpt_e' and
'dc_fixpt_pi'
  drm/amd/display: make various variables in fixed31_32.h 'global'
instead of 'static'

 drivers/gpu/drm/amd/display/amdgpu_dm/Makefile |  2 +-
 .../gpu/drm/amd/display/amdgpu_dm/fixed31_32.c | 13 +
 .../gpu/drm/amd/display/include/fixed31_32.h   | 18 --
 3 files changed, 22 insertions(+), 11 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/display/amdgpu_dm/fixed31_32.c

-- 
2.17.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amd/display: remove set but not used variable 'dc_fixpt_e' and 'dc_fixpt_pi'

2019-11-20 Thread yu kuai

'dc_fixpt_e' and 'dc_fixpt_pi' are set in 'fixed31_32.h'. However, they
are never used and so can be removed.

Fixes: eb0e515464e4 ("drm/amd/display: get rid of 32.32 unsigned fixed point")
Signed-off-by: yu kuai 
---
 drivers/gpu/drm/amd/display/include/fixed31_32.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/include/fixed31_32.h 
b/drivers/gpu/drm/amd/display/include/fixed31_32.h
index 89ef9f6860e5..291215362e3f 100644
--- a/drivers/gpu/drm/amd/display/include/fixed31_32.h
+++ b/drivers/gpu/drm/amd/display/include/fixed31_32.h
@@ -69,9 +69,7 @@ static const struct fixed31_32 dc_fixpt_epsilon = { 1LL };
 static const struct fixed31_32 dc_fixpt_half = { 0x8000LL };
 static const struct fixed31_32 dc_fixpt_one = { 0x1LL };
 
-static const struct fixed31_32 dc_fixpt_pi = { 13493037705LL };
 static const struct fixed31_32 dc_fixpt_two_pi = { 26986075409LL };
-static const struct fixed31_32 dc_fixpt_e = { 11674931555LL };
 static const struct fixed31_32 dc_fixpt_ln2 = { 2977044471LL };
 static const struct fixed31_32 dc_fixpt_ln2_div_2 = { 1488522236LL };
 
-- 
2.17.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: 答复: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Christian König


Am 20.11.19 um 15:59 schrieb Liu, Monk:

the KIQ is used to invalidate both the GFXHUB as well as the MMHUB on Vega.

I know,


+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);

But ring->funcs->vmhub wil always be AMDGPU_GFXHUB, right ? since this ring is from 
">ring" ?


Ah! Good catch, that is indeed incorrect.

Christian.





Yes, agree. But since we now knew that we won't need that we can just drop this 
patch altogether.

Yeah, the semaphore wrapping is in PATCH 2/2, agree that this PATCH 1/2 could 
be dropped


-邮件原件-
发件人: Christian König 
发送时间: 2019年11月20日 22:39
收件人: Liu, Monk ; Zhu, Changfeng ; Koenig, Christian 
; Xiao, Jack ; Zhou1, Tao ; Huang, Ray 
; Huang, Shimmer ; amd-gfx@lists.freedesktop.org
主题: Re: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
workaround in amdgpu_virt

Hi Monk,

the KIQ is used to invalidate both the GFXHUB as well as the MMHUB on Vega.


Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper
function that only serve VM invalidate, so I don't think You should
put the semaphore read/write in this routine, instead you can put
semaphore r/w out side of this routine and only Put them around the VM
invalidate logic

Yes, agree. But since we now knew that we won't need that we can just drop this 
patch altogether.

Regards,
Christian.

Am 20.11.19 um 15:30 schrieb Liu, Monk:

Thanks for sharing this JIR

now I got the picture of this issue from you and Christian.

So the semaphore grabbing can prevent RTL to power off the MMHUB, I
see

The practice is that SRIOV won't enable PG at all (even our GIM driver
won't enable PG, maybe in future we would enable it )

I think I don't have too many concern about your patches,

But I have comments on your patch 1:

void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
   {
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t seq;
   
   	spin_lock_irqsave(>ring_lock, flags);

-   amdgpu_ring_alloc(ring, 32);
+   amdgpu_ring_alloc(ring, 60);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);


See that in this routine, the ring is always KIQ, so below code looks redundant 
:

+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);

Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper
function that only serve VM invalidate, so I don't think You should
put the semaphore read/write in this routine, instead you can put
semaphore r/w out side of this routine and only Put them around the VM
invalidate logic

Thanks

-邮件原件-
发件人: Zhu, Changfeng 
发送时间: 2019年11月20日 22:17
收件人: Koenig, Christian ; Liu, Monk
; Xiao, Jack ; Zhou1, Tao
; Huang, Ray ; Huang, Shimmer
; amd-gfx@lists.freedesktop.org
主题: RE: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore
workaround in amdgpu_virt


Did Changfeng already hit this issue under SRIOV ???

I meet this problem on navi14 under gmc_v10_0_emit_flush_gpu_tlb .
The problem is also seen by Zhou,Tao.

And this is ticket:
http://ontrack-internal.amd.com/browse/SWDEV-201459

After the semaphore patch, the problem can be fixed.

If SROV has concern about this problem,  it should not add semaphore in SROV.

However, we should apply semaphore for gmc_v9_0_flush_gpu_tlb/
gmc_v9_0_emit_flush_gpu_tlb/ gmc_v10_0_flush_gpu_tlb/
gmc_v10_0_emit_flush_gpu_tlb

Or how can we handle the ticket above?

BR,
Changfeng.

-Original Message-
From: Christian König 
Sent: Wednesday, November 20, 2019 10:00 PM
To: Liu, Monk ; Koenig, Christian
; Zhu, Changfeng ;
Xiao, Jack ; Zhou1, Tao ; Huang,
Ray ; Huang, Shimmer ;
amd-gfx@lists.freedesktop.org
Subject: Re: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore
workaround in amdgpu_virt


Did Changfeng already hit this issue under SRIOV ?

I don't think so, but Changfeng needs to answer this.

Question is does

RE: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Zeng, Oak

See an inline comment 

Regards,
Oak

-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Wednesday, November 20, 2019 8:21 AM
To: Liu, Monk ; Zhu, Changfeng ; Xiao, 
Jack ; Zhou1, Tao ; Huang, Ray 
; Huang, Shimmer ; 
amd-gfx@lists.freedesktop.org
Subject: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround 
in amdgpu_virt

Hi Monk,

this is a fix for power gating the MMHUB.

Basic problem is that the MMHUB can power gate while an invalidation is in 
progress
[Oak] I am not familiar about the power gating sequence but from first glance, 
should the power gating sequence make sure that HW is ready (idle) for power 
gating before put the system to power gating? E.g., before we put the system to 
power gating, should we enquiry each HW blocks to see whether the HW is idle? 
If not (like the case you mentioned some invalidation activities is still 
ongoing) the power gating condition is not mature and we should we wait. Or if 
the power gating is trigger/initiated by HW (I am not sure), HW should 
guarantee it is idle?

 which looses all bits in the ACK register and so deadlocks the engine waiting 
for the invalidation to finish.

This bug is hit immediately when we enable power gating of the MMHUB.

Regards,
Christian.

Am 20.11.19 um 14:18 schrieb Liu, Monk:
> Hi Changfeng
>
> Firs of all, there is no power-gating off circle involved in AMDGPU 
> SRIOV, since we don't allow VF/VM do such things so I do feel strange 
> why you post something like this Especially on VEGA10 serials which 
> looks doesn't have any issue on those gpu_flush part
>
> Here is my questions for you:
> 1) Can you point me what issue had you been experienced ? and how to 
> repro the bug
> 2) if you do hit some issues, did you verified that your patch can fix it ?
>
> besides
>
> /Monk
>
> -邮件原件-
> 发件人: amd-gfx  代表 Changfeng.Zhu
> 发送时间: 2019年11月20日 17:14
> 收件人: Koenig, Christian ; Xiao, Jack 
> ; Zhou1, Tao ; Huang, Ray 
> ; Huang, Shimmer ; 
> amd-gfx@lists.freedesktop.org
> 抄送: Zhu, Changfeng 
> 主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
> amdgpu_virt
>
> From: changzhu 
>
> It may lose gpuvm invalidate acknowldege state across power-gating off cycle. 
> To avoid this issue in virt invalidation, add semaphore acquire before 
> invalidation and semaphore release after invalidation.
>
> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
> Signed-off-by: changzhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--  
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
>   3 files changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index f04eb1a64271..70ffaf91cd12 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device 
> *adev, uint32_t reg, uint32_t v)
>   
>   void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t reg0, uint32_t reg1,
> - uint32_t ref, uint32_t mask)
> + uint32_t ref, uint32_t mask,
> + uint32_t sem)
>   {
>   struct amdgpu_kiq *kiq = >gfx.kiq;
>   struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t seq;
>   
>   spin_lock_irqsave(>ring_lock, flags);
> - amdgpu_ring_alloc(ring, 32);
> + amdgpu_ring_alloc(ring, 60);
> +
> + /*
> +  * It may lose gpuvm invalidate acknowldege state across power-gating
> +  * off cycle, add semaphore acquire before invalidation and semaphore
> +  * release after invalidation to avoid entering power gated state
> +  * to WA the Issue
> +  */
> +
> + /* a read return value of 1 means semaphore acuqire */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
> +
>   amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
>   ref, mask);
> + /*
> +  * add semaphore release after invalidation,
> +  * write with 0 means semaphore release
> +  */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_wreg(ring, sem, 0);
> +
>   amdgpu_fence_emit_polling(ring, );
>   amdgpu_ring_commit(ring);
>   spin_unlock_irqrestore(>ring_lock, flags); diff --git 
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> index b0b2bdc750df..bda6a2f37dc0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> @@

答复: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Liu, Monk

>>the KIQ is used to invalidate both the GFXHUB as well as the MMHUB on Vega.

I know,

> + /* a read return value of 1 means semaphore acuqire */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);

But ring->funcs->vmhub wil always be AMDGPU_GFXHUB, right ? since this ring is 
from ">ring" ? 


>> Yes, agree. But since we now knew that we won't need that we can just drop 
>> this patch altogether.

Yeah, the semaphore wrapping is in PATCH 2/2, agree that this PATCH 1/2 could 
be dropped 


-邮件原件-
发件人: Christian König  
发送时间: 2019年11月20日 22:39
收件人: Liu, Monk ; Zhu, Changfeng ; 
Koenig, Christian ; Xiao, Jack ; 
Zhou1, Tao ; Huang, Ray ; Huang, Shimmer 
; amd-gfx@lists.freedesktop.org
主题: Re: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
workaround in amdgpu_virt

Hi Monk,

the KIQ is used to invalidate both the GFXHUB as well as the MMHUB on Vega.

> Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper 
> function that only serve VM invalidate, so I don't think You should 
> put the semaphore read/write in this routine, instead you can put 
> semaphore r/w out side of this routine and only Put them around the VM 
> invalidate logic
Yes, agree. But since we now knew that we won't need that we can just drop this 
patch altogether.

Regards,
Christian.

Am 20.11.19 um 15:30 schrieb Liu, Monk:
> Thanks for sharing this JIR
>
> now I got the picture of this issue from you and Christian.
>
> So the semaphore grabbing can prevent RTL to power off the MMHUB, I 
> see
>
> The practice is that SRIOV won't enable PG at all (even our GIM driver 
> won't enable PG, maybe in future we would enable it )
>
> I think I don't have too many concern about your patches,
>
> But I have comments on your patch 1:
>
> void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t reg0, uint32_t reg1,
> - uint32_t ref, uint32_t mask)
> + uint32_t ref, uint32_t mask,
> + uint32_t sem)
>   {
>   struct amdgpu_kiq *kiq = >gfx.kiq;
>   struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t seq;
>   
>   spin_lock_irqsave(>ring_lock, flags);
> - amdgpu_ring_alloc(ring, 32);
> + amdgpu_ring_alloc(ring, 60);
> +
> + /*
> +  * It may lose gpuvm invalidate acknowldege state across power-gating
> +  * off cycle, add semaphore acquire before invalidation and semaphore
> +  * release after invalidation to avoid entering power gated state
> +  * to WA the Issue
> +  */
> +
> + /* a read return value of 1 means semaphore acuqire */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
>
>
> See that in this routine, the ring is always KIQ, so below code looks 
> redundant :
>
> + /* a read return value of 1 means semaphore acuqire */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
>
> Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper 
> function that only serve VM invalidate, so I don't think You should 
> put the semaphore read/write in this routine, instead you can put 
> semaphore r/w out side of this routine and only Put them around the VM 
> invalidate logic
>
> Thanks
>
> -邮件原件-
> 发件人: Zhu, Changfeng 
> 发送时间: 2019年11月20日 22:17
> 收件人: Koenig, Christian ; Liu, Monk 
> ; Xiao, Jack ; Zhou1, Tao 
> ; Huang, Ray ; Huang, Shimmer 
> ; amd-gfx@lists.freedesktop.org
> 主题: RE: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
> workaround in amdgpu_virt
>
 Did Changfeng already hit this issue under SRIOV ???
> I meet this problem on navi14 under gmc_v10_0_emit_flush_gpu_tlb .
> The problem is also seen by Zhou,Tao.
>
> And this is ticket:
> http://ontrack-internal.amd.com/browse/SWDEV-201459
>
> After the semaphore patch, the problem can be fixed.
>
> If SROV has concern about this problem,  it should not add semaphore in SROV.
>
> However, we should apply semaphore for gmc_v9_0_flush_gpu_tlb/ 
> gmc_v9_0_emit_flush_gpu_tlb/ gmc_v10_0_flush_gpu_tlb/ 
> gmc_v10_0_emit_flush_gpu_tlb
>
> Or how can we handle the ticket above?
>
> BR,
> Changfeng.
>
> -Original Message-
> From: Christian König 
> Sent: Wednesday, November 20, 2019 10:00 PM
> To: Liu, Monk ; Koenig, Christian 
> ; Zhu, Changfeng ; 
> Xiao, Jack ; Zhou1, Tao ; Huang, 
> Ray ; Huang, Shimmer ; 
> amd-gfx@lists.freedesktop.org
> Subject: Re: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
> workaround in amdgpu_virt
>
>> Did Changfeng already hit this issue under SRIOV ?
> I

Re: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Christian König


I think you can drop this KIQ patch anyway, or at least delay it.

We do want to use the hw semaphores for various reasons on the async rings.

But for the SW triggered flush we don't really have a need for this as 
long as GIM doesn't support power gating of the MMHUB.


Regards,
Christian.

Am 20.11.19 um 15:50 schrieb Zhu, Changfeng:

Well, I'll wait the help from IPE GFX team and try to apply GFXHUB as well and 
then perfect these invalidate semaphore patches.

If SRIOV team want to enable invalidate semaphore in future, it can try to take 
this patch back in that time.

BR,
Changfeng.

-Original Message-
From: Christian König 
Sent: Wednesday, November 20, 2019 10:39 PM
To: Liu, Monk ; Zhu, Changfeng ; Koenig, Christian 
; Xiao, Jack ; Zhou1, Tao ; Huang, Ray 
; Huang, Shimmer ; amd-gfx@lists.freedesktop.org
Subject: Re: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
workaround in amdgpu_virt

Hi Monk,

the KIQ is used to invalidate both the GFXHUB as well as the MMHUB on Vega.


Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper
function that only serve VM invalidate, so I don't think You should
put the semaphore read/write in this routine, instead you can put
semaphore r/w out side of this routine and only Put them around the VM
invalidate logic

Yes, agree. But since we now knew that we won't need that we can just drop this 
patch altogether.

Regards,
Christian.

Am 20.11.19 um 15:30 schrieb Liu, Monk:

Thanks for sharing this JIR

now I got the picture of this issue from you and Christian.

So the semaphore grabbing can prevent RTL to power off the MMHUB, I
see

The practice is that SRIOV won't enable PG at all (even our GIM driver
won't enable PG, maybe in future we would enable it )

I think I don't have too many concern about your patches,

But I have comments on your patch 1:

void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
   {
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t seq;
   
   	spin_lock_irqsave(>ring_lock, flags);

-   amdgpu_ring_alloc(ring, 32);
+   amdgpu_ring_alloc(ring, 60);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);


See that in this routine, the ring is always KIQ, so below code looks redundant 
:

+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);

Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper
function that only serve VM invalidate, so I don't think You should
put the semaphore read/write in this routine, instead you can put
semaphore r/w out side of this routine and only Put them around the VM
invalidate logic

Thanks

-邮件原件-
发件人: Zhu, Changfeng 
发送时间: 2019年11月20日 22:17
收件人: Koenig, Christian ; Liu, Monk
; Xiao, Jack ; Zhou1, Tao
; Huang, Ray ; Huang, Shimmer
; amd-gfx@lists.freedesktop.org
主题: RE: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore
workaround in amdgpu_virt


Did Changfeng already hit this issue under SRIOV ???

I meet this problem on navi14 under gmc_v10_0_emit_flush_gpu_tlb .
The problem is also seen by Zhou,Tao.

And this is ticket:
http://ontrack-internal.amd.com/browse/SWDEV-201459

After the semaphore patch, the problem can be fixed.

If SROV has concern about this problem,  it should not add semaphore in SROV.

However, we should apply semaphore for gmc_v9_0_flush_gpu_tlb/
gmc_v9_0_emit_flush_gpu_tlb/ gmc_v10_0_flush_gpu_tlb/
gmc_v10_0_emit_flush_gpu_tlb

Or how can we handle the ticket above?

BR,
Changfeng.

-Original Message-
From: Christian König 
Sent: Wednesday, November 20, 2019 10:00 PM
To: Liu, Monk ; Koenig, Christian
; Zhu, Changfeng ;
Xiao, Jack ; Zhou1, Tao ; Huang,
Ray ; Huang, Shimmer ;
amd-gfx@lists.freedesktop.org
Subject: Re: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore
workaround in amdgpu_virt


Did Changfeng already hit this issue under SRIOV ?

I don't think so, but Changfeng needs to answer this.

Question is does the extra semaphore acquire has some negative effect on SRIOV?

I

RE: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Zhu, Changfeng

Well, I'll wait the help from IPE GFX team and try to apply GFXHUB as well and 
then perfect these invalidate semaphore patches.

If SRIOV team want to enable invalidate semaphore in future, it can try to take 
this patch back in that time.

BR,
Changfeng.

-Original Message-
From: Christian König  
Sent: Wednesday, November 20, 2019 10:39 PM
To: Liu, Monk ; Zhu, Changfeng ; 
Koenig, Christian ; Xiao, Jack ; 
Zhou1, Tao ; Huang, Ray ; Huang, Shimmer 
; amd-gfx@lists.freedesktop.org
Subject: Re: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
workaround in amdgpu_virt

Hi Monk,

the KIQ is used to invalidate both the GFXHUB as well as the MMHUB on Vega.

> Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper 
> function that only serve VM invalidate, so I don't think You should 
> put the semaphore read/write in this routine, instead you can put 
> semaphore r/w out side of this routine and only Put them around the VM 
> invalidate logic
Yes, agree. But since we now knew that we won't need that we can just drop this 
patch altogether.

Regards,
Christian.

Am 20.11.19 um 15:30 schrieb Liu, Monk:
> Thanks for sharing this JIR
>
> now I got the picture of this issue from you and Christian.
>
> So the semaphore grabbing can prevent RTL to power off the MMHUB, I 
> see
>
> The practice is that SRIOV won't enable PG at all (even our GIM driver 
> won't enable PG, maybe in future we would enable it )
>
> I think I don't have too many concern about your patches,
>
> But I have comments on your patch 1:
>
> void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t reg0, uint32_t reg1,
> - uint32_t ref, uint32_t mask)
> + uint32_t ref, uint32_t mask,
> + uint32_t sem)
>   {
>   struct amdgpu_kiq *kiq = >gfx.kiq;
>   struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t seq;
>   
>   spin_lock_irqsave(>ring_lock, flags);
> - amdgpu_ring_alloc(ring, 32);
> + amdgpu_ring_alloc(ring, 60);
> +
> + /*
> +  * It may lose gpuvm invalidate acknowldege state across power-gating
> +  * off cycle, add semaphore acquire before invalidation and semaphore
> +  * release after invalidation to avoid entering power gated state
> +  * to WA the Issue
> +  */
> +
> + /* a read return value of 1 means semaphore acuqire */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
>
>
> See that in this routine, the ring is always KIQ, so below code looks 
> redundant :
>
> + /* a read return value of 1 means semaphore acuqire */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
>
> Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper 
> function that only serve VM invalidate, so I don't think You should 
> put the semaphore read/write in this routine, instead you can put 
> semaphore r/w out side of this routine and only Put them around the VM 
> invalidate logic
>
> Thanks
>
> -邮件原件-
> 发件人: Zhu, Changfeng 
> 发送时间: 2019年11月20日 22:17
> 收件人: Koenig, Christian ; Liu, Monk 
> ; Xiao, Jack ; Zhou1, Tao 
> ; Huang, Ray ; Huang, Shimmer 
> ; amd-gfx@lists.freedesktop.org
> 主题: RE: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
> workaround in amdgpu_virt
>
 Did Changfeng already hit this issue under SRIOV ???
> I meet this problem on navi14 under gmc_v10_0_emit_flush_gpu_tlb .
> The problem is also seen by Zhou,Tao.
>
> And this is ticket:
> http://ontrack-internal.amd.com/browse/SWDEV-201459
>
> After the semaphore patch, the problem can be fixed.
>
> If SROV has concern about this problem,  it should not add semaphore in SROV.
>
> However, we should apply semaphore for gmc_v9_0_flush_gpu_tlb/ 
> gmc_v9_0_emit_flush_gpu_tlb/ gmc_v10_0_flush_gpu_tlb/ 
> gmc_v10_0_emit_flush_gpu_tlb
>
> Or how can we handle the ticket above?
>
> BR,
> Changfeng.
>
> -Original Message-
> From: Christian König 
> Sent: Wednesday, November 20, 2019 10:00 PM
> To: Liu, Monk ; Koenig, Christian 
> ; Zhu, Changfeng ; 
> Xiao, Jack ; Zhou1, Tao ; Huang, 
> Ray ; Huang, Shimmer ; 
> amd-gfx@lists.freedesktop.org
> Subject: Re: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
> workaround in amdgpu_virt
>
>> Did Changfeng already hit this issue under SRIOV ?
> I don't think so, but Changfeng needs to answer this.
>
> Question is does the extra semaphore acquire has some negative effect on 
> SRIOV?
>
> I would like to avoid having even more SRIOV specific handling in here which 
> we can't really test on bare metal.
>
> Christian.
>
> Am 20.11.19 um 14:54 schrieb Liu,

Re: 答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Christian König


Hi Monk,

the KIQ is used to invalidate both the GFXHUB as well as the MMHUB on Vega.


Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper function 
that only serve VM invalidate, so I don't think
You should put the semaphore read/write in this routine, instead you can put 
semaphore r/w out side of this routine and only
Put them around the VM invalidate logic
Yes, agree. But since we now knew that we won't need that we can just 
drop this patch altogether.


Regards,
Christian.

Am 20.11.19 um 15:30 schrieb Liu, Monk:

Thanks for sharing this JIR

now I got the picture of this issue from you and Christian.

So the semaphore grabbing can prevent RTL to power off the MMHUB, I see

The practice is that SRIOV won't enable PG at all (even our GIM driver won't 
enable PG, maybe in future we would enable it )

I think I don't have too many concern about your patches,

But I have comments on your patch 1:

void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
  {
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t seq;
  
  	spin_lock_irqsave(>ring_lock, flags);

-   amdgpu_ring_alloc(ring, 32);
+   amdgpu_ring_alloc(ring, 60);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);


See that in this routine, the ring is always KIQ, so below code looks redundant 
:

+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);

Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper function 
that only serve VM invalidate, so I don't think
You should put the semaphore read/write in this routine, instead you can put 
semaphore r/w out side of this routine and only
Put them around the VM invalidate logic

Thanks

-邮件原件-
发件人: Zhu, Changfeng 
发送时间: 2019年11月20日 22:17
收件人: Koenig, Christian ; Liu, Monk ; Xiao, Jack 
; Zhou1, Tao ; Huang, Ray ; Huang, 
Shimmer ; amd-gfx@lists.freedesktop.org
主题: RE: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
amdgpu_virt


Did Changfeng already hit this issue under SRIOV ???

I meet this problem on navi14 under gmc_v10_0_emit_flush_gpu_tlb .
The problem is also seen by Zhou,Tao.

And this is ticket:
http://ontrack-internal.amd.com/browse/SWDEV-201459

After the semaphore patch, the problem can be fixed.

If SROV has concern about this problem,  it should not add semaphore in SROV.

However, we should apply semaphore for gmc_v9_0_flush_gpu_tlb/ 
gmc_v9_0_emit_flush_gpu_tlb/ gmc_v10_0_flush_gpu_tlb/ 
gmc_v10_0_emit_flush_gpu_tlb

Or how can we handle the ticket above?

BR,
Changfeng.

-Original Message-
From: Christian König 
Sent: Wednesday, November 20, 2019 10:00 PM
To: Liu, Monk ; Koenig, Christian ; Zhu, Changfeng 
; Xiao, Jack ; Zhou1, Tao ; Huang, Ray 
; Huang, Shimmer ; amd-gfx@lists.freedesktop.org
Subject: Re: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
workaround in amdgpu_virt


Did Changfeng already hit this issue under SRIOV ?

I don't think so, but Changfeng needs to answer this.

Question is does the extra semaphore acquire has some negative effect on SRIOV?

I would like to avoid having even more SRIOV specific handling in here which we 
can't really test on bare metal.

Christian.

Am 20.11.19 um 14:54 schrieb Liu, Monk:

Hah, but in SRIOV case, our guest KMD driver is not allowed to do such
things  (and even there is a bug that KMD try to power gate, the
SMU firmware would not really do the jobs since We have PSP L1 policy
to prevent those danger operations )

Did Changfeng already hit this issue under SRIOV ???

-邮件原件-
发件人: Koenig, Christian 
发送时间: 2019年11月20日 21:21
收件人: Liu, Monk ; Zhu, Changfeng
; Xiao, Jack ; Zhou1, Tao
; Huang, Ray ; Huang, Shimmer
; amd-gfx@lists.freedesktop.org
主题: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore
workaround in amdgpu_virt

Hi Monk,

this is a fix for power gating the MMHUB.

Basic problem is that the MMHUB can power gate while an invalidation is in 
progress which looses all bits in the ACK register and

答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Liu, Monk

Thanks for sharing this JIR 

now I got the picture of this issue from you and Christian.

So the semaphore grabbing can prevent RTL to power off the MMHUB, I see 

The practice is that SRIOV won't enable PG at all (even our GIM driver won't 
enable PG, maybe in future we would enable it )

I think I don't have too many concern about your patches, 

But I have comments on your patch 1:

void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
 {
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t seq;
 
spin_lock_irqsave(>ring_lock, flags);
-   amdgpu_ring_alloc(ring, 32);
+   amdgpu_ring_alloc(ring, 60);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);


See that in this routine, the ring is always KIQ, so below code looks redundant 
:

+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);

Besides, amdgpu_virt_kiq_reg_write_reg_wait() is not deadly a helper function 
that only serve VM invalidate, so I don't think 
You should put the semaphore read/write in this routine, instead you can put 
semaphore r/w out side of this routine and only
Put them around the VM invalidate logic 

Thanks 

-邮件原件-
发件人: Zhu, Changfeng  
发送时间: 2019年11月20日 22:17
收件人: Koenig, Christian ; Liu, Monk 
; Xiao, Jack ; Zhou1, Tao 
; Huang, Ray ; Huang, Shimmer 
; amd-gfx@lists.freedesktop.org
主题: RE: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
amdgpu_virt

>>> Did Changfeng already hit this issue under SRIOV ???

I meet this problem on navi14 under gmc_v10_0_emit_flush_gpu_tlb .
The problem is also seen by Zhou,Tao.

And this is ticket:
http://ontrack-internal.amd.com/browse/SWDEV-201459

After the semaphore patch, the problem can be fixed.

If SROV has concern about this problem,  it should not add semaphore in SROV.

However, we should apply semaphore for gmc_v9_0_flush_gpu_tlb/ 
gmc_v9_0_emit_flush_gpu_tlb/ gmc_v10_0_flush_gpu_tlb/ 
gmc_v10_0_emit_flush_gpu_tlb

Or how can we handle the ticket above?

BR,
Changfeng.

-Original Message-
From: Christian König 
Sent: Wednesday, November 20, 2019 10:00 PM
To: Liu, Monk ; Koenig, Christian ; 
Zhu, Changfeng ; Xiao, Jack ; Zhou1, 
Tao ; Huang, Ray ; Huang, Shimmer 
; amd-gfx@lists.freedesktop.org
Subject: Re: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
workaround in amdgpu_virt

> Did Changfeng already hit this issue under SRIOV ?
I don't think so, but Changfeng needs to answer this.

Question is does the extra semaphore acquire has some negative effect on SRIOV?

I would like to avoid having even more SRIOV specific handling in here which we 
can't really test on bare metal.

Christian.

Am 20.11.19 um 14:54 schrieb Liu, Monk:
> Hah, but in SRIOV case, our guest KMD driver is not allowed to do such 
> things  (and even there is a bug that KMD try to power gate, the 
> SMU firmware would not really do the jobs since We have PSP L1 policy 
> to prevent those danger operations )
>
> Did Changfeng already hit this issue under SRIOV ???
>
> -邮件原件-
> 发件人: Koenig, Christian 
> 发送时间: 2019年11月20日 21:21
> 收件人: Liu, Monk ; Zhu, Changfeng 
> ; Xiao, Jack ; Zhou1, Tao 
> ; Huang, Ray ; Huang, Shimmer 
> ; amd-gfx@lists.freedesktop.org
> 主题: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
> workaround in amdgpu_virt
>
> Hi Monk,
>
> this is a fix for power gating the MMHUB.
>
> Basic problem is that the MMHUB can power gate while an invalidation is in 
> progress which looses all bits in the ACK register and so deadlocks the 
> engine waiting for the invalidation to finish.
>
> This bug is hit immediately when we enable power gating of the MMHUB.
>
> Regards,
> Christian.
>
> Am 20.11.19 um 14:18 schrieb Liu, Monk:
>> Hi Changfeng
>>
>> Firs of all, there is no power-gating off circle involved in AMDGPU 
>> SRIOV, since we don't allow VF/VM do such things so I do feel strange 
>> why you post something like this Especially on VEGA10 serials which 
>> looks doesn't have any issue on

Re: [PATCH] drm/amdgpu: apply gpr/gds workaround before enabling GFX EDC mode

2019-11-20 Thread Deucher, Alexander

Reviewed-by: Alex Deucher 

From: Hawking Zhang 
Sent: Wednesday, November 20, 2019 7:36 AM
To: amd-gfx@lists.freedesktop.org ; Chen, Guchun 
; Clements, John ; Deucher, 
Alexander 
Cc: Zhang, Hawking 
Subject: [PATCH] drm/amdgpu: apply gpr/gds workaround before enabling GFX EDC 
mode

gfx memory should be initialized before enabling
DED and FUE field in mmGB_EDC_MODE

Change-Id: I248a087364cbd9858cba32a70be456af3f07c90d
Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 0d6965ba7a86..68359059e390 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -4213,10 +4213,6 @@ static int gfx_v9_0_ecc_late_init(void *handle)
 struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 int r;

-   r = amdgpu_gfx_ras_late_init(adev);
-   if (r)
-   return r;
-
 r = gfx_v9_0_do_edc_gds_workarounds(adev);
 if (r)
 return r;
@@ -4226,6 +4222,10 @@ static int gfx_v9_0_ecc_late_init(void *handle)
 if (r)
 return r;

+   r = amdgpu_gfx_ras_late_init(adev);
+   if (r)
+   return r;
+
 return 0;
 }

--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2 1/3] drm/amdgpu: define soc15_ras_field_entry for reuse

2019-11-20 Thread Deucher, Alexander

Series is:
Reviewed-by: Alex Deucher 

From: Dennis Li 
Sent: Wednesday, November 20, 2019 5:49 AM
To: amd-gfx@lists.freedesktop.org ; Deucher, 
Alexander ; Zhou1, Tao ; Zhang, 
Hawking ; Chen, Guchun 
Cc: Li, Dennis ; Li, Dennis 
Subject: [PATCH v2 1/3] drm/amdgpu: define soc15_ras_field_entry for reuse

The struct soc15_ras_field_entry will be reused by
other IPs, such as mmhub and gc

v2: rename ras_subblock_regs to gc_ras_fields_vg20,
because the future asic maybe have a different table.

Change-Id: I6c3388a09b5fbf927ad90fcd626baa448d1681a6
Signed-off-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 34 +--
 drivers/gpu/drm/amd/amdgpu/soc15.h| 12 ++
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index c7ae685d6f74..8073fcd4720e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -131,18 +131,6 @@ MODULE_FIRMWARE("amdgpu/renoir_rlc.bin");
 #define mmTCP_CHAN_STEER_5_ARCT
 0x0b0c
 #define mmTCP_CHAN_STEER_5_ARCT_BASE_IDX   
 0

-struct ras_gfx_subblock_reg {
-   const char *name;
-   uint32_t hwip;
-   uint32_t inst;
-   uint32_t seg;
-   uint32_t reg_offset;
-   uint32_t sec_count_mask;
-   uint32_t sec_count_shift;
-   uint32_t ded_count_mask;
-   uint32_t ded_count_shift;
-};
-
 enum ta_ras_gfx_subblock {
 /*CPC*/
 TA_RAS_BLOCK__GFX_CPC_INDEX_START = 0,
@@ -5487,7 +5475,7 @@ static int gfx_v9_0_priv_inst_irq(struct amdgpu_device 
*adev,
 }


-static const struct ras_gfx_subblock_reg ras_subblock_regs[] = {
+static const struct soc15_ras_field_entry gc_ras_fields_vg20[] = {
 { "CPC_SCRATCH", SOC15_REG_ENTRY(GC, 0, mmCPC_EDC_SCRATCH_CNT),
   SOC15_REG_FIELD(CPC_EDC_SCRATCH_CNT, SEC_COUNT),
   SOC15_REG_FIELD(CPC_EDC_SCRATCH_CNT, DED_COUNT)
@@ -6146,29 +6134,29 @@ static int __get_ras_error_count(const struct 
soc15_reg_entry *reg,
 uint32_t i;
 uint32_t sec_cnt, ded_cnt;

-   for (i = 0; i < ARRAY_SIZE(ras_subblock_regs); i++) {
-   if(ras_subblock_regs[i].reg_offset != reg->reg_offset ||
-   ras_subblock_regs[i].seg != reg->seg ||
-   ras_subblock_regs[i].inst != reg->inst)
+   for (i = 0; i < ARRAY_SIZE(gc_ras_fields_vg20); i++) {
+   if(gc_ras_fields_vg20[i].reg_offset != reg->reg_offset ||
+   gc_ras_fields_vg20[i].seg != reg->seg ||
+   gc_ras_fields_vg20[i].inst != reg->inst)
 continue;

 sec_cnt = (value &
-   ras_subblock_regs[i].sec_count_mask) >>
-   ras_subblock_regs[i].sec_count_shift;
+   gc_ras_fields_vg20[i].sec_count_mask) >>
+   gc_ras_fields_vg20[i].sec_count_shift;
 if (sec_cnt) {
 DRM_INFO("GFX SubBlock %s, Instance[%d][%d], SEC %d\n",
-   ras_subblock_regs[i].name,
+   gc_ras_fields_vg20[i].name,
 se_id, inst_id,
 sec_cnt);
 *sec_count += sec_cnt;
 }

 ded_cnt = (value &
-   ras_subblock_regs[i].ded_count_mask) >>
-   ras_subblock_regs[i].ded_count_shift;
+   gc_ras_fields_vg20[i].ded_count_mask) >>
+   gc_ras_fields_vg20[i].ded_count_shift;
 if (ded_cnt) {
 DRM_INFO("GFX SubBlock %s, Instance[%d][%d], DED %d\n",
-   ras_subblock_regs[i].name,
+   gc_ras_fields_vg20[i].name,
 se_id, inst_id,
 ded_cnt);
 *ded_count += ded_cnt;
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.h 
b/drivers/gpu/drm/amd/amdgpu/soc15.h
index 9af6c6ffbfa2..344280b869c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.h
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.h
@@ -60,6 +60,18 @@ struct soc15_allowed_register_entry {
 bool grbm_indexed;
 };

+struct soc15_ras_field_entry {
+   const char *name;
+   uint32_t hwip;
+   uint32_t inst;
+   uint32_t seg;
+   uint32_t reg_offset;
+   uint32_t sec_count_mask;
+   uint32_t sec_count_shift;
+   uint32_t ded_count_mask;
+   uint32_t ded_count_shift;
+};
+
 #define SOC15_REG_ENTRY(ip, inst, reg)  ip##_HWIP, inst, reg##_BASE_IDX, reg

 #define SOC15_REG_ENTRY_OFFSET(entry)   
(adev->reg_offset[entry.hwip][entry.inst][entry.seg] + entry.reg_offset)
--
2.17.1

答复: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Liu, Monk

>> Question is does the extra semaphore acquire has some negative effect on 
>> SRIOV?

HI Christian

With more thought I think introduce semaphore get/put by kiq before/after the 
vm invalidate looks won't introduce world switch issue, because the world 
switch ruin VM invalidate only happens if the world switch
Occurred in the middle of VM invalidate itself ... so if the VF is preempted 
after semaphore read, then it's fine since VM invalidate is not even begin ...

But regarding this patches , I have something not clear:

>>  struct amdgpu_kiq *kiq = >gfx.kiq;
>>  struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
>> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>  uint32_t seq;
>>
>>  spin_lock_irqsave(>ring_lock, flags);
>> -amdgpu_ring_alloc(ring, 32);
>> +amdgpu_ring_alloc(ring, 60);
>> +
>> +/*
>> + * It may lose gpuvm invalidate acknowldege state across power-gating
>> + * off cycle, add semaphore acquire before invalidation and semaphore
>> + * release after invalidation to avoid entering power gated state
>> + * to WA the Issue
>> + */
>> +
>> +/* a read return value of 1 means semaphore acuqire */
>> +if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
>> +ring->funcs->vmhub == AMDGPU_MMHUB_1)
>> +amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);

See that the ring is >ring, so why it need to check "if 
(ring->funcs->vmhub == AMDGPU_MMHUB_0" ? kiq obviously is not MMUB, but 
GFXHUB...

Besides, why the semaphore read before VN invalidate can prevent power gating 
?? I didn't tell from the patch , 
is there another change that use KIQ to grab the semaphore before trying to do 
power gating  as well ?


thanks 

-邮件原件-
发件人: Christian König  
发送时间: 2019年11月20日 22:00
收件人: Liu, Monk ; Koenig, Christian 
; Zhu, Changfeng ; Xiao, Jack 
; Zhou1, Tao ; Huang, Ray 
; Huang, Shimmer ; 
amd-gfx@lists.freedesktop.org
主题: Re: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
amdgpu_virt

> Did Changfeng already hit this issue under SRIOV ?
I don't think so, but Changfeng needs to answer this.

Question is does the extra semaphore acquire has some negative effect on SRIOV?

I would like to avoid having even more SRIOV specific handling in here which we 
can't really test on bare metal.

Christian.

Am 20.11.19 um 14:54 schrieb Liu, Monk:
> Hah, but in SRIOV case, our guest KMD driver is not allowed to do such 
> things  (and even there is a bug that KMD try to power gate, the 
> SMU firmware would not really do the jobs since We have PSP L1 policy 
> to prevent those danger operations )
>
> Did Changfeng already hit this issue under SRIOV ???
>
> -邮件原件-
> 发件人: Koenig, Christian 
> 发送时间: 2019年11月20日 21:21
> 收件人: Liu, Monk ; Zhu, Changfeng 
> ; Xiao, Jack ; Zhou1, Tao 
> ; Huang, Ray ; Huang, Shimmer 
> ; amd-gfx@lists.freedesktop.org
> 主题: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
> workaround in amdgpu_virt
>
> Hi Monk,
>
> this is a fix for power gating the MMHUB.
>
> Basic problem is that the MMHUB can power gate while an invalidation is in 
> progress which looses all bits in the ACK register and so deadlocks the 
> engine waiting for the invalidation to finish.
>
> This bug is hit immediately when we enable power gating of the MMHUB.
>
> Regards,
> Christian.
>
> Am 20.11.19 um 14:18 schrieb Liu, Monk:
>> Hi Changfeng
>>
>> Firs of all, there is no power-gating off circle involved in AMDGPU 
>> SRIOV, since we don't allow VF/VM do such things so I do feel strange 
>> why you post something like this Especially on VEGA10 serials which 
>> looks doesn't have any issue on those gpu_flush part
>>
>> Here is my questions for you:
>> 1) Can you point me what issue had you been experienced ? and how to 
>> repro the bug
>> 2) if you do hit some issues, did you verified that your patch can fix it ?
>>
>> besides
>>
>> /Monk
>>
>> -邮件原件-
>> 发件人: amd-gfx  代表 Changfeng.Zhu
>> 发送时间: 2019年11月20日 17:14
>> 收件人: Koenig, Christian ; Xiao, Jack 
>> ; Zhou1, Tao ; Huang, Ray 
>> ; Huang, Shimmer ; 
>> amd-gfx@lists.freedesktop.org
>> 抄送: Zhu, Changfeng 
>> 主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
>> amdgpu_virt
>>
>> From: changzhu 
>>
>> It may lose gpuvm invalidate acknowldege state across power-gating off 
>> cycle. To avoid this issue in virt invalidation, add semaphore acquire 
>> before invalidation and semaphore release after invalidation.
>>
>> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
>> Signed-off-by: changzhu 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--  
>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
>>drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
>>3 files changed, 28 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> index f04eb1a64271..70ffaf91cd12 100644
>> ---

RE: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Zhu, Changfeng

>>> Did Changfeng already hit this issue under SRIOV ???

I meet this problem on navi14 under gmc_v10_0_emit_flush_gpu_tlb .
The problem is also seen by Zhou,Tao.

And this is ticket:
http://ontrack-internal.amd.com/browse/SWDEV-201459

After the semaphore patch, the problem can be fixed.

If SROV has concern about this problem,  it should not add semaphore in SROV.

However, we should apply semaphore for gmc_v9_0_flush_gpu_tlb/ 
gmc_v9_0_emit_flush_gpu_tlb/ gmc_v10_0_flush_gpu_tlb/ 
gmc_v10_0_emit_flush_gpu_tlb

Or how can we handle the ticket above?

BR,
Changfeng.

-Original Message-
From: Christian König  
Sent: Wednesday, November 20, 2019 10:00 PM
To: Liu, Monk ; Koenig, Christian ; 
Zhu, Changfeng ; Xiao, Jack ; Zhou1, 
Tao ; Huang, Ray ; Huang, Shimmer 
; amd-gfx@lists.freedesktop.org
Subject: Re: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
workaround in amdgpu_virt

> Did Changfeng already hit this issue under SRIOV ?
I don't think so, but Changfeng needs to answer this.

Question is does the extra semaphore acquire has some negative effect on SRIOV?

I would like to avoid having even more SRIOV specific handling in here which we 
can't really test on bare metal.

Christian.

Am 20.11.19 um 14:54 schrieb Liu, Monk:
> Hah, but in SRIOV case, our guest KMD driver is not allowed to do such 
> things  (and even there is a bug that KMD try to power gate, the 
> SMU firmware would not really do the jobs since We have PSP L1 policy 
> to prevent those danger operations )
>
> Did Changfeng already hit this issue under SRIOV ???
>
> -邮件原件-
> 发件人: Koenig, Christian 
> 发送时间: 2019年11月20日 21:21
> 收件人: Liu, Monk ; Zhu, Changfeng 
> ; Xiao, Jack ; Zhou1, Tao 
> ; Huang, Ray ; Huang, Shimmer 
> ; amd-gfx@lists.freedesktop.org
> 主题: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
> workaround in amdgpu_virt
>
> Hi Monk,
>
> this is a fix for power gating the MMHUB.
>
> Basic problem is that the MMHUB can power gate while an invalidation is in 
> progress which looses all bits in the ACK register and so deadlocks the 
> engine waiting for the invalidation to finish.
>
> This bug is hit immediately when we enable power gating of the MMHUB.
>
> Regards,
> Christian.
>
> Am 20.11.19 um 14:18 schrieb Liu, Monk:
>> Hi Changfeng
>>
>> Firs of all, there is no power-gating off circle involved in AMDGPU 
>> SRIOV, since we don't allow VF/VM do such things so I do feel strange 
>> why you post something like this Especially on VEGA10 serials which 
>> looks doesn't have any issue on those gpu_flush part
>>
>> Here is my questions for you:
>> 1) Can you point me what issue had you been experienced ? and how to 
>> repro the bug
>> 2) if you do hit some issues, did you verified that your patch can fix it ?
>>
>> besides
>>
>> /Monk
>>
>> -邮件原件-
>> 发件人: amd-gfx  代表 Changfeng.Zhu
>> 发送时间: 2019年11月20日 17:14
>> 收件人: Koenig, Christian ; Xiao, Jack 
>> ; Zhou1, Tao ; Huang, Ray 
>> ; Huang, Shimmer ; 
>> amd-gfx@lists.freedesktop.org
>> 抄送: Zhu, Changfeng 
>> 主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
>> amdgpu_virt
>>
>> From: changzhu 
>>
>> It may lose gpuvm invalidate acknowldege state across power-gating off 
>> cycle. To avoid this issue in virt invalidation, add semaphore acquire 
>> before invalidation and semaphore release after invalidation.
>>
>> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
>> Signed-off-by: changzhu 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--  
>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
>>drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
>>3 files changed, 28 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> index f04eb1a64271..70ffaf91cd12 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> @@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device 
>> *adev, uint32_t reg, uint32_t v)
>>
>>void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>  uint32_t reg0, uint32_t reg1,
>> -uint32_t ref, uint32_t mask)
>> +uint32_t ref, uint32_t mask,
>> +uint32_t sem)
>>{
>>  struct amdgpu_kiq *kiq = >gfx.kiq;
>>  struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
>> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>  uint32_t seq;
>>
>>  spin_lock_irqsave(>ring_lock, flags);
>> -amdgpu_ring_alloc(ring, 32);
>> +amdgpu_ring_alloc(ring, 60);
>> +
>> +/*
>> + * It may lose gpuvm invalidate acknowldege state across power-gating
>> + * off cycle, add semaphore acquire before invalidation and semaphore
>> + * release after invalidation to avoid entering power gated state
>> + * to

Re: 答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Christian König


Did Changfeng already hit this issue under SRIOV ?

I don't think so, but Changfeng needs to answer this.

Question is does the extra semaphore acquire has some negative effect on 
SRIOV?


I would like to avoid having even more SRIOV specific handling in here 
which we can't really test on bare metal.


Christian.

Am 20.11.19 um 14:54 schrieb Liu, Monk:

Hah, but in SRIOV case, our guest KMD driver is not allowed to do such things 
 (and even there is a bug that KMD try to power gate, the SMU firmware 
would not really do the jobs since
We have PSP L1 policy to prevent those danger operations )

Did Changfeng already hit this issue under SRIOV ???

-邮件原件-
发件人: Koenig, Christian 
发送时间: 2019年11月20日 21:21
收件人: Liu, Monk ; Zhu, Changfeng ; Xiao, Jack 
; Zhou1, Tao ; Huang, Ray ; Huang, 
Shimmer ; amd-gfx@lists.freedesktop.org
主题: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
amdgpu_virt

Hi Monk,

this is a fix for power gating the MMHUB.

Basic problem is that the MMHUB can power gate while an invalidation is in 
progress which looses all bits in the ACK register and so deadlocks the engine 
waiting for the invalidation to finish.

This bug is hit immediately when we enable power gating of the MMHUB.

Regards,
Christian.

Am 20.11.19 um 14:18 schrieb Liu, Monk:

Hi Changfeng

Firs of all, there is no power-gating off circle involved in AMDGPU
SRIOV, since we don't allow VF/VM do such things so I do feel strange
why you post something like this Especially on VEGA10 serials which
looks doesn't have any issue on those gpu_flush part

Here is my questions for you:
1) Can you point me what issue had you been experienced ? and how to
repro the bug
2) if you do hit some issues, did you verified that your patch can fix it ?

besides

/Monk

-邮件原件-
发件人: amd-gfx  代表 Changfeng.Zhu
发送时间: 2019年11月20日 17:14
收件人: Koenig, Christian ; Xiao, Jack
; Zhou1, Tao ; Huang, Ray
; Huang, Shimmer ;
amd-gfx@lists.freedesktop.org
抄送: Zhu, Changfeng 
主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in
amdgpu_virt

From: changzhu 

It may lose gpuvm invalidate acknowldege state across power-gating off cycle. 
To avoid this issue in virt invalidation, add semaphore acquire before 
invalidation and semaphore release after invalidation.

Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
Signed-off-by: changzhu 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--  
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
   3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f04eb1a64271..70ffaf91cd12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device
*adev, uint32_t reg, uint32_t v)
   
   void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,

uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
   {
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t seq;
   
   	spin_lock_irqsave(>ring_lock, flags);

-   amdgpu_ring_alloc(ring, 32);
+   amdgpu_ring_alloc(ring, 60);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
+
amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
ref, mask);
+   /*
+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_wreg(ring, sem, 0);
+
amdgpu_fence_emit_polling(ring, );
amdgpu_ring_commit(ring);
spin_unlock_irqrestore(>ring_lock, flags); diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index b0b2bdc750df..bda6a2f37dc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, 
uint32_t reg);  void

答复: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Liu, Monk

Hah, but in SRIOV case, our guest KMD driver is not allowed to do such things 
 (and even there is a bug that KMD try to power gate, the SMU firmware 
would not really do the jobs since
We have PSP L1 policy to prevent those danger operations )

Did Changfeng already hit this issue under SRIOV ???

-邮件原件-
发件人: Koenig, Christian  
发送时间: 2019年11月20日 21:21
收件人: Liu, Monk ; Zhu, Changfeng ; 
Xiao, Jack ; Zhou1, Tao ; Huang, Ray 
; Huang, Shimmer ; 
amd-gfx@lists.freedesktop.org
主题: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
amdgpu_virt

Hi Monk,

this is a fix for power gating the MMHUB.

Basic problem is that the MMHUB can power gate while an invalidation is in 
progress which looses all bits in the ACK register and so deadlocks the engine 
waiting for the invalidation to finish.

This bug is hit immediately when we enable power gating of the MMHUB.

Regards,
Christian.

Am 20.11.19 um 14:18 schrieb Liu, Monk:
> Hi Changfeng
>
> Firs of all, there is no power-gating off circle involved in AMDGPU 
> SRIOV, since we don't allow VF/VM do such things so I do feel strange 
> why you post something like this Especially on VEGA10 serials which 
> looks doesn't have any issue on those gpu_flush part
>
> Here is my questions for you:
> 1) Can you point me what issue had you been experienced ? and how to 
> repro the bug
> 2) if you do hit some issues, did you verified that your patch can fix it ?
>
> besides
>
> /Monk
>
> -邮件原件-
> 发件人: amd-gfx  代表 Changfeng.Zhu
> 发送时间: 2019年11月20日 17:14
> 收件人: Koenig, Christian ; Xiao, Jack 
> ; Zhou1, Tao ; Huang, Ray 
> ; Huang, Shimmer ; 
> amd-gfx@lists.freedesktop.org
> 抄送: Zhu, Changfeng 
> 主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
> amdgpu_virt
>
> From: changzhu 
>
> It may lose gpuvm invalidate acknowldege state across power-gating off cycle. 
> To avoid this issue in virt invalidation, add semaphore acquire before 
> invalidation and semaphore release after invalidation.
>
> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
> Signed-off-by: changzhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--  
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
>   3 files changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index f04eb1a64271..70ffaf91cd12 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device 
> *adev, uint32_t reg, uint32_t v)
>   
>   void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t reg0, uint32_t reg1,
> - uint32_t ref, uint32_t mask)
> + uint32_t ref, uint32_t mask,
> + uint32_t sem)
>   {
>   struct amdgpu_kiq *kiq = >gfx.kiq;
>   struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t seq;
>   
>   spin_lock_irqsave(>ring_lock, flags);
> - amdgpu_ring_alloc(ring, 32);
> + amdgpu_ring_alloc(ring, 60);
> +
> + /*
> +  * It may lose gpuvm invalidate acknowldege state across power-gating
> +  * off cycle, add semaphore acquire before invalidation and semaphore
> +  * release after invalidation to avoid entering power gated state
> +  * to WA the Issue
> +  */
> +
> + /* a read return value of 1 means semaphore acuqire */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
> +
>   amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
>   ref, mask);
> + /*
> +  * add semaphore release after invalidation,
> +  * write with 0 means semaphore release
> +  */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_wreg(ring, sem, 0);
> +
>   amdgpu_fence_emit_polling(ring, );
>   amdgpu_ring_commit(ring);
>   spin_unlock_irqrestore(>ring_lock, flags); diff --git 
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> index b0b2bdc750df..bda6a2f37dc0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> @@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, 
> uint32_t reg);  void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, 
> uint32_t reg, uint32_t v);  void amdgpu_virt_kiq_reg_write_reg_wait(struct 
> amdgpu_device *adev,
>   uint32_t reg0, uint32_t rreg1,
> -

Re: [PATCH] drm/dsc: Return unsigned long on compute offset

2019-11-20 Thread Mikita Lipski




On 20/11/2019 05:17, Ville Syrjälä wrote:

On Tue, Nov 19, 2019 at 04:11:43PM -0500, Mikita Lipski wrote:



On 19/11/2019 16:09, Mikita Lipski wrote:



On 19/11/2019 12:11, Ville Syrjälä wrote:

On Tue, Nov 19, 2019 at 04:59:40PM +, Cornij, Nikola wrote:

If you're going to make all of them the same, then u64, please.

This is because I'm not sure if calculations require 64-bit at some
stage.


If it does then it's already broken. Someone should probably figure out
what's actally needed instead of shooting ducks with an icbm.




Sorry made a type below. Supposed to be "I don't think it is broken"


I mean that it's broken if it actually needs u64 when it's
currently using unsigned long. So u64 is either overkill or the
code is currently broken.



None of the calculations exceed u32, so u64 would be an overkill, since 
none of the variables in the structure exceed 16 bits. Therefore u32 is 
enough.





I don't think it is not broken, cause I'm currently testing DSC.
The patch I sent early simply fixes the error of comparing  signed and
unsigned variables.

We can then submit a second patch addressing the issue of using unsigned
long int instead of u32. Also, since the variables in drm_dsc_config
structure are all of type u8 and u16, the calculation values shouldn't
exceed the size of u32.

Thanks



-Original Message-
From: Lipski, Mikita 
Sent: November 19, 2019 10:08 AM
To: Ville Syrjälä ; Lipski, Mikita

Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org;
Cornij, Nikola 
Subject: Re: [PATCH] drm/dsc: Return unsigned long on compute offset



On 19/11/2019 09:56, Ville Syrjälä wrote:

On Tue, Nov 19, 2019 at 09:45:26AM -0500, mikita.lip...@amd.com wrote:

From: Mikita Lipski 

We shouldn't compare int with unsigned long to find the max value and
since we are not expecting negative value returned from
compute_offset we should make this function return unsigned long so
we can compare the values when computing rc parameters.


Why are there other unsigned longs in dsc parameter computation in the
first place?


I believe it was initially set to be unsigned long for variable
consistency, when we ported intel_compute_rc_parameters into
drm_dsc_compute_rc_parameters. But now that I look at it, we can
actually just set them to u32 or u64, as nothing should exceed that.




Cc: Nikola Cornij 
Cc: Harry Wentland 
Signed-off-by: Mikita Lipski 
---
    drivers/gpu/drm/drm_dsc.c | 6 +++---
    1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_dsc.c b/drivers/gpu/drm/drm_dsc.c
index 74f3527f567d..ec40604ab6a2 100644
--- a/drivers/gpu/drm/drm_dsc.c
+++ b/drivers/gpu/drm/drm_dsc.c
@@ -245,11 +245,11 @@ void drm_dsc_pps_payload_pack(struct
drm_dsc_picture_parameter_set *pps_payload,
    }
    EXPORT_SYMBOL(drm_dsc_pps_payload_pack);
-static int compute_offset(struct drm_dsc_config *vdsc_cfg, int
pixels_per_group,
+static unsigned long compute_offset(struct drm_dsc_config
*vdsc_cfg, int pixels_per_group,
    int groups_per_line, int grpcnt)
    {
-    int offset = 0;
-    int grpcnt_id = DIV_ROUND_UP(vdsc_cfg->initial_xmit_delay,
pixels_per_group);
+    unsigned long offset = 0;
+    unsigned long grpcnt_id =
DIV_ROUND_UP(vdsc_cfg->initial_xmit_delay, pixels_per_group);
    if (grpcnt <= grpcnt_id)
    offset = DIV_ROUND_UP(grpcnt * pixels_per_group *
vdsc_cfg->bits_per_pixel, 16);
--
2.17.1

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




--
Thanks,
Mikita Lipski
Software Engineer 2, AMD
mikita.lip...@amd.com






--
Thanks,
Mikita Lipski
Software Engineer 2, AMD
mikita.lip...@amd.com




--
Thanks,
Mikita Lipski
Software Engineer 2, AMD
mikita.lip...@amd.com
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Christian König


Ok in this case we should just drop this patch.

Any objections on the other semaphore patch? IIRC we added the 
amdgpu_ring_emit_reg_write_reg_wait() especially to make sure that an 
invalidation can't be interrupted by a world switch.


When we add manual semaphore acquire/release before and after the 
invalidation that could break this quite badly.


Regards,
Christian.

Am 20.11.19 um 14:30 schrieb Liu, Monk:

Question for Emily and Monk: Do we support power gating of the MMHUB with 
SRIOV? I don't think so and when that's correct we could just drop this patch.

Any power gating if now allowed to be controlled by a VF in a guest VM 

It is hypervisor driver's (gim) responsibility to conduct when can our hardware 
entering a power circle (e.g. BACO reset), and we have software mechanism to 
make sure
Power gating off circle shall only happen when all engine is idle (or any of 
them was hang) state.

And even some engine isn't hang (e.g. KIQ is still doing things like read 
register or gpu_flush_tlb , etc...) if GIM decide to power off GPU (BACO reset) 
then that's okay for KIQ, since
After BACO all engines would be re-init anyway

Thanks

/Monk

-邮件原件-
发件人: Christian König 
发送时间: 2019年11月20日 19:24
收件人: Zhu, Changfeng ; Koenig, Christian ; Xiao, Jack 
; Zhou1, Tao ; Huang, Ray ; Huang, Shimmer 
; amd-gfx@lists.freedesktop.org; Deng, Emily ; Liu, Monk 

主题: Re: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
amdgpu_virt

Hi Changfeng,

[adding Monk and Emily as well].

I thought more about this and came to the conclusion that this won't work and 
might result in a lockup as well.

We are using the KIQ on SRIOV for GPUVM invalidation because we need an atomic 
read/modify/write cycle since we found that the invalidation engine is resetted 
with every world switch.

Now accessing the semaphore registers is not atomic any more and we could have 
a world switch in between grabbing the semaphore and sending the VM 
invalidation. That either won't work or could result in a lockup as well.

Question for Emily and Monk: Do we support power gating of the MMHUB with 
SRIOV? I don't think so and when that's correct we could just drop this patch.

Regards,
Christian.

Am 20.11.19 um 10:14 schrieb Changfeng.Zhu:

From: changzhu 

It may lose gpuvm invalidate acknowldege state across power-gating off
cycle. To avoid this issue in virt invalidation, add semaphore acquire
before invalidation and semaphore release after invalidation.

Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
Signed-off-by: changzhu 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--
   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
   3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f04eb1a64271..70ffaf91cd12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device
*adev, uint32_t reg, uint32_t v)
   
   void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,

uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
   {
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void
amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t seq;
   
   	spin_lock_irqsave(>ring_lock, flags);

-   amdgpu_ring_alloc(ring, 32);
+   amdgpu_ring_alloc(ring, 60);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
+
amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
ref, mask);
+   /*
+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_wreg(ring, sem, 0);
+
amdgpu_fence_emit_polling(ring, );
amdgpu_ring_commit(ring);
spin_unlock_irqrestore(>ring_lock, flags); diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index b0b2bdc750df..bda6a2f37dc0 100644
---

答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Liu, Monk

>>Question for Emily and Monk: Do we support power gating of the MMHUB with 
>>SRIOV? I don't think so and when that's correct we could just drop this patch.

Any power gating if now allowed to be controlled by a VF in a guest VM  

It is hypervisor driver's (gim) responsibility to conduct when can our hardware 
entering a power circle (e.g. BACO reset), and we have software mechanism to 
make sure 
Power gating off circle shall only happen when all engine is idle (or any of 
them was hang) state. 

And even some engine isn't hang (e.g. KIQ is still doing things like read 
register or gpu_flush_tlb , etc...) if GIM decide to power off GPU (BACO reset) 
then that's okay for KIQ, since
After BACO all engines would be re-init anyway 

Thanks 

/Monk

-邮件原件-
发件人: Christian König  
发送时间: 2019年11月20日 19:24
收件人: Zhu, Changfeng ; Koenig, Christian 
; Xiao, Jack ; Zhou1, Tao 
; Huang, Ray ; Huang, Shimmer 
; amd-gfx@lists.freedesktop.org; Deng, Emily 
; Liu, Monk 
主题: Re: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
amdgpu_virt

Hi Changfeng,

[adding Monk and Emily as well].

I thought more about this and came to the conclusion that this won't work and 
might result in a lockup as well.

We are using the KIQ on SRIOV for GPUVM invalidation because we need an atomic 
read/modify/write cycle since we found that the invalidation engine is resetted 
with every world switch.

Now accessing the semaphore registers is not atomic any more and we could have 
a world switch in between grabbing the semaphore and sending the VM 
invalidation. That either won't work or could result in a lockup as well.

Question for Emily and Monk: Do we support power gating of the MMHUB with 
SRIOV? I don't think so and when that's correct we could just drop this patch.

Regards,
Christian.

Am 20.11.19 um 10:14 schrieb Changfeng.Zhu:
> From: changzhu 
>
> It may lose gpuvm invalidate acknowldege state across power-gating off 
> cycle. To avoid this issue in virt invalidation, add semaphore acquire 
> before invalidation and semaphore release after invalidation.
>
> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
> Signed-off-by: changzhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
>   3 files changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index f04eb1a64271..70ffaf91cd12 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device 
> *adev, uint32_t reg, uint32_t v)
>   
>   void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t reg0, uint32_t reg1,
> - uint32_t ref, uint32_t mask)
> + uint32_t ref, uint32_t mask,
> + uint32_t sem)
>   {
>   struct amdgpu_kiq *kiq = >gfx.kiq;
>   struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t seq;
>   
>   spin_lock_irqsave(>ring_lock, flags);
> - amdgpu_ring_alloc(ring, 32);
> + amdgpu_ring_alloc(ring, 60);
> +
> + /*
> +  * It may lose gpuvm invalidate acknowldege state across power-gating
> +  * off cycle, add semaphore acquire before invalidation and semaphore
> +  * release after invalidation to avoid entering power gated state
> +  * to WA the Issue
> +  */
> +
> + /* a read return value of 1 means semaphore acuqire */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
> +
>   amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
>   ref, mask);
> + /*
> +  * add semaphore release after invalidation,
> +  * write with 0 means semaphore release
> +  */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_wreg(ring, sem, 0);
> +
>   amdgpu_fence_emit_polling(ring, );
>   amdgpu_ring_commit(ring);
>   spin_unlock_irqrestore(>ring_lock, flags); diff --git 
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> index b0b2bdc750df..bda6a2f37dc0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> @@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, 
> uint32_t reg);
>   void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, 
> uint32_t v);
>   void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>

答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Liu, Monk

>>>are using the KIQ on SRIOV for GPUVM invalidation because we need an atomic 
>>>read/modify/write cycle since we found that the invalidation engine is 
>>>resetted with every world switch.
>>> accessing the semaphore registers is not atomic any more and we could have 
>>> a world switch in between grabbing the semaphore and sending the VM 
>>> invalidation. That either won't work or could result in a lockup as well.

Yeah, Christian is right on that,
By KIQ doing the VM invalidation through KIQ the whole procedure is atomic (no 
world switch breaks it), if you split works into two (or more) KIQ job , that 
means between any two KIQ jobs there is chance to introduce world switch
(KIQ busy will block world switch, and once KIQ idle a world switch could 
immediately happen)

I'm still trying to get what problem you guys meet with ?

-邮件原件-
发件人: Christian König  
发送时间: 2019年11月20日 19:24
收件人: Zhu, Changfeng ; Koenig, Christian 
; Xiao, Jack ; Zhou1, Tao 
; Huang, Ray ; Huang, Shimmer 
; amd-gfx@lists.freedesktop.org; Deng, Emily 
; Liu, Monk 
主题: Re: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
amdgpu_virt

Hi Changfeng,

[adding Monk and Emily as well].

I thought more about this and came to the conclusion that this won't work and 
might result in a lockup as well.

We are using the KIQ on SRIOV for GPUVM invalidation because we need an atomic 
read/modify/write cycle since we found that the invalidation engine is resetted 
with every world switch.

Now accessing the semaphore registers is not atomic any more and we could have 
a world switch in between grabbing the semaphore and sending the VM 
invalidation. That either won't work or could result in a lockup as well.

Question for Emily and Monk: Do we support power gating of the MMHUB with 
SRIOV? I don't think so and when that's correct we could just drop this patch.

Regards,
Christian.

Am 20.11.19 um 10:14 schrieb Changfeng.Zhu:
> From: changzhu 
>
> It may lose gpuvm invalidate acknowldege state across power-gating off 
> cycle. To avoid this issue in virt invalidation, add semaphore acquire 
> before invalidation and semaphore release after invalidation.
>
> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
> Signed-off-by: changzhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
>   3 files changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index f04eb1a64271..70ffaf91cd12 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device 
> *adev, uint32_t reg, uint32_t v)
>   
>   void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t reg0, uint32_t reg1,
> - uint32_t ref, uint32_t mask)
> + uint32_t ref, uint32_t mask,
> + uint32_t sem)
>   {
>   struct amdgpu_kiq *kiq = >gfx.kiq;
>   struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>   uint32_t seq;
>   
>   spin_lock_irqsave(>ring_lock, flags);
> - amdgpu_ring_alloc(ring, 32);
> + amdgpu_ring_alloc(ring, 60);
> +
> + /*
> +  * It may lose gpuvm invalidate acknowldege state across power-gating
> +  * off cycle, add semaphore acquire before invalidation and semaphore
> +  * release after invalidation to avoid entering power gated state
> +  * to WA the Issue
> +  */
> +
> + /* a read return value of 1 means semaphore acuqire */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
> +
>   amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
>   ref, mask);
> + /*
> +  * add semaphore release after invalidation,
> +  * write with 0 means semaphore release
> +  */
> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
> + ring->funcs->vmhub == AMDGPU_MMHUB_1)
> + amdgpu_ring_emit_wreg(ring, sem, 0);
> +
>   amdgpu_fence_emit_polling(ring, );
>   amdgpu_ring_commit(ring);
>   spin_unlock_irqrestore(>ring_lock, flags); diff --git 
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> index b0b2bdc750df..bda6a2f37dc0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> @@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, 
> uint32_t reg);
>   void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, 
> uint32_t v);
>   void

Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Christian König


Hi Monk,

this is a fix for power gating the MMHUB.

Basic problem is that the MMHUB can power gate while an invalidation is 
in progress which looses all bits in the ACK register and so deadlocks 
the engine waiting for the invalidation to finish.


This bug is hit immediately when we enable power gating of the MMHUB.

Regards,
Christian.

Am 20.11.19 um 14:18 schrieb Liu, Monk:

Hi Changfeng

Firs of all, there is no power-gating off circle involved in AMDGPU SRIOV, 
since we don't allow VF/VM do such things so I do feel strange why you post 
something like this
Especially on VEGA10 serials which looks doesn't have any issue on those 
gpu_flush part

Here is my questions for you:
1) Can you point me what issue had you been experienced ? and how to repro the 
bug
2) if you do hit some issues, did you verified that your patch can fix it ?

besides

/Monk

-邮件原件-
发件人: amd-gfx  代表 Changfeng.Zhu
发送时间: 2019年11月20日 17:14
收件人: Koenig, Christian ; Xiao, Jack ; Zhou1, Tao 
; Huang, Ray ; Huang, Shimmer ; 
amd-gfx@lists.freedesktop.org
抄送: Zhu, Changfeng 
主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

From: changzhu 

It may lose gpuvm invalidate acknowldege state across power-gating off cycle. 
To avoid this issue in virt invalidation, add semaphore acquire before 
invalidation and semaphore release after invalidation.

Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
Signed-off-by: changzhu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--  
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
  3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f04eb1a64271..70ffaf91cd12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, 
uint32_t reg, uint32_t v)
  
  void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,

uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
  {
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t seq;
  
  	spin_lock_irqsave(>ring_lock, flags);

-   amdgpu_ring_alloc(ring, 32);
+   amdgpu_ring_alloc(ring, 60);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
+
amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
ref, mask);
+   /*
+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_wreg(ring, sem, 0);
+
amdgpu_fence_emit_polling(ring, );
amdgpu_ring_commit(ring);
spin_unlock_irqrestore(>ring_lock, flags); diff --git 
a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index b0b2bdc750df..bda6a2f37dc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, 
uint32_t reg);  void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t 
reg, uint32_t v);  void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device 
*adev,
uint32_t reg0, uint32_t rreg1,
-   uint32_t ref, uint32_t mask);
+   uint32_t ref, uint32_t mask,
+   uint32_t sem);
  int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool init);  int 
amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, bool init);  int 
amdgpu_virt_reset_gpu(struct amdgpu_device *adev); diff --git 
a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index f25cd97ba5f2..1ae59af7836a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -448,9 +448,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,

答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Liu, Monk

Hi Changfeng 

Firs of all, there is no power-gating off circle involved in AMDGPU SRIOV, 
since we don't allow VF/VM do such things so I do feel strange why you post 
something like this 
Especially on VEGA10 serials which looks doesn't have any issue on those 
gpu_flush part 

Here is my questions for you:
1) Can you point me what issue had you been experienced ? and how to repro the 
bug
2) if you do hit some issues, did you verified that your patch can fix it ?

besides

/Monk

-邮件原件-
发件人: amd-gfx  代表 Changfeng.Zhu
发送时间: 2019年11月20日 17:14
收件人: Koenig, Christian ; Xiao, Jack 
; Zhou1, Tao ; Huang, Ray 
; Huang, Shimmer ; 
amd-gfx@lists.freedesktop.org
抄送: Zhu, Changfeng 
主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

From: changzhu 

It may lose gpuvm invalidate acknowldege state across power-gating off cycle. 
To avoid this issue in virt invalidation, add semaphore acquire before 
invalidation and semaphore release after invalidation.

Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
Signed-off-by: changzhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--  
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f04eb1a64271..70ffaf91cd12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, 
uint32_t reg, uint32_t v)
 
 void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
 {
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring; @@ -144,9 +145,30 @@ void 
amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t seq;
 
spin_lock_irqsave(>ring_lock, flags);
-   amdgpu_ring_alloc(ring, 32);
+   amdgpu_ring_alloc(ring, 60);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
+
amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
ref, mask);
+   /*
+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_wreg(ring, sem, 0);
+
amdgpu_fence_emit_polling(ring, );
amdgpu_ring_commit(ring);
spin_unlock_irqrestore(>ring_lock, flags); diff --git 
a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index b0b2bdc750df..bda6a2f37dc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, 
uint32_t reg);  void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t 
reg, uint32_t v);  void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device 
*adev,
uint32_t reg0, uint32_t rreg1,
-   uint32_t ref, uint32_t mask);
+   uint32_t ref, uint32_t mask,
+   uint32_t sem);
 int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool init);  int 
amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, bool init);  int 
amdgpu_virt_reset_gpu(struct amdgpu_device *adev); diff --git 
a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index f25cd97ba5f2..1ae59af7836a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -448,9 +448,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
!adev->in_gpu_reset) {
uint32_t req = hub->vm_inv_eng0_req + eng;
uint32_t ack = hub->vm_inv_eng0_ack + eng;
+   uint32_t sem = hub->vm_inv_eng0_sem + eng;
 
amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, tmp,
-   1 << vmid);
+  1 << vmid,

[PATCH] drm/amdgpu: apply gpr/gds workaround before enabling GFX EDC mode

2019-11-20 Thread Hawking Zhang

gfx memory should be initialized before enabling
DED and FUE field in mmGB_EDC_MODE

Change-Id: I248a087364cbd9858cba32a70be456af3f07c90d
Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 0d6965ba7a86..68359059e390 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -4213,10 +4213,6 @@ static int gfx_v9_0_ecc_late_init(void *handle)
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
int r;
 
-   r = amdgpu_gfx_ras_late_init(adev);
-   if (r)
-   return r;
-
r = gfx_v9_0_do_edc_gds_workarounds(adev);
if (r)
return r;
@@ -4226,6 +4222,10 @@ static int gfx_v9_0_ecc_late_init(void *handle)
if (r)
return r;
 
+   r = amdgpu_gfx_ras_late_init(adev);
+   if (r)
+   return r;
+
return 0;
 }
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/psp: declare PSP TA firmware

2019-11-20 Thread Mike Lothian

Hi

Any idea where I can find those blobs?

Cheers

Mike

On Wed, 16 Oct 2019 at 03:48, chen gong  wrote:
>
> Add PSP TA firmware declaration for raven raven2 picasso
>
> Signed-off-by: chen gong 
> ---
>  drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
> index b96484a..b345e69 100644
> --- a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c
> @@ -40,6 +40,9 @@
>  MODULE_FIRMWARE("amdgpu/raven_asd.bin");
>  MODULE_FIRMWARE("amdgpu/picasso_asd.bin");
>  MODULE_FIRMWARE("amdgpu/raven2_asd.bin");
> +MODULE_FIRMWARE("amdgpu/picasso_ta.bin");
> +MODULE_FIRMWARE("amdgpu/raven2_ta.bin");
> +MODULE_FIRMWARE("amdgpu/raven_ta.bin");
>
>  static int psp_v10_0_init_microcode(struct psp_context *psp)
>  {
> --
> 2.7.4
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: invalidate mmhub semphore workaround in gmc9/gmc10

2019-11-20 Thread Christian König


Am 20.11.19 um 10:44 schrieb Changfeng.Zhu:

From: changzhu 

It may lose gpuvm invalidate acknowldege state across power-gating off
cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore acquire
before invalidation and semaphore release after invalidation.

After adding semaphore acquire before invalidation, the semaphore
register become read-only if another process try to acquire semaphore.
Then it will not be able to release this semaphore. Then it may cause
deadlock problem. If this deadlock problem happens, it needs a semaphore
firmware fix.


Please remove the DRM_WARN_ONCE, that looks like overkill to me.

And I'm not sure how urgent that issue here is. We could also wait a few 
more days and see if the hw guys figure out why this lockups on the GFX 
ring.


Regards,
Christian.



Change-Id: I9942a2f451265c1f1038ccfe2f70042c7c8118af
Signed-off-by: changzhu 
---
  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 49 ++
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 49 ++
  drivers/gpu/drm/amd/amdgpu/soc15.h |  4 +--
  3 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index af2615ba52aa..685d0d5ef31e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -234,6 +234,24 @@ static void gmc_v10_0_flush_vm_hub(struct amdgpu_device 
*adev, uint32_t vmid,
const unsigned eng = 17;
unsigned int i;
  
+	spin_lock(>gmc.invalidate_lock);

+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+   for (i = 0; i < adev->usec_timeout; i++) {
+   /* a read return value of 1 means semaphore acuqire */
+   tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_sem + eng);
+   if (tmp & 0x1)
+   break;
+   udelay(1);
+   }
+
+   if (i >= adev->usec_timeout)
+   DRM_ERROR("Timeout waiting for sem acquire in VM flush!\n");
+
WREG32_NO_KIQ(hub->vm_inv_eng0_req + eng, tmp);
  
  	/*

@@ -253,6 +271,14 @@ static void gmc_v10_0_flush_vm_hub(struct amdgpu_device 
*adev, uint32_t vmid,
udelay(1);
}
  
+	/*

+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   WREG32_NO_KIQ(hub->vm_inv_eng0_sem + eng, 0);
+
+   spin_unlock(>gmc.invalidate_lock);
+
if (i < adev->usec_timeout)
return;
  
@@ -338,6 +364,21 @@ static uint64_t gmc_v10_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring,

uint32_t req = gmc_v10_0_get_invalidate_req(vmid, 0);
unsigned eng = ring->vm_inv_eng;
  
+	/*

+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1) {
+   amdgpu_ring_emit_reg_wait(ring,
+ hub->vm_inv_eng0_sem + eng, 0x1, 0x1);
+   DRM_WARN_ONCE("Adding semaphore may cause deadlock and it needs 
firmware fix\n");
+   }
+
amdgpu_ring_emit_wreg(ring, hub->ctx0_ptb_addr_lo32 + (2 * vmid),
  lower_32_bits(pd_addr));
  
@@ -348,6 +389,14 @@ static uint64_t gmc_v10_0_emit_flush_gpu_tlb(struct amdgpu_ring *ring,

hub->vm_inv_eng0_ack + eng,
req, 1 << vmid);
  
+	/*

+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_sem + eng, 0);
+
return pd_addr;
  }
  
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c

index 1ae59af7836a..c4118cbb0fbe 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -456,6 +456,24 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
}
  
  	spin_lock(>gmc.invalidate_lock);

+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+   for (j = 0; j <

Re: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Christian König


Hi Changfeng,

[adding Monk and Emily as well].

I thought more about this and came to the conclusion that this won't 
work and might result in a lockup as well.


We are using the KIQ on SRIOV for GPUVM invalidation because we need an 
atomic read/modify/write cycle since we found that the invalidation 
engine is resetted with every world switch.


Now accessing the semaphore registers is not atomic any more and we 
could have a world switch in between grabbing the semaphore and sending 
the VM invalidation. That either won't work or could result in a lockup 
as well.


Question for Emily and Monk: Do we support power gating of the MMHUB 
with SRIOV? I don't think so and when that's correct we could just drop 
this patch.


Regards,
Christian.

Am 20.11.19 um 10:14 schrieb Changfeng.Zhu:

From: changzhu 

It may lose gpuvm invalidate acknowldege state across power-gating off
cycle. To avoid this issue in virt invalidation, add semaphore acquire
before invalidation and semaphore release after invalidation.

Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
Signed-off-by: changzhu 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
  3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f04eb1a64271..70ffaf91cd12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, 
uint32_t reg, uint32_t v)
  
  void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,

uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
  {
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring;
@@ -144,9 +145,30 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct 
amdgpu_device *adev,
uint32_t seq;
  
  	spin_lock_irqsave(>ring_lock, flags);

-   amdgpu_ring_alloc(ring, 32);
+   amdgpu_ring_alloc(ring, 60);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
+
amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
ref, mask);
+   /*
+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_wreg(ring, sem, 0);
+
amdgpu_fence_emit_polling(ring, );
amdgpu_ring_commit(ring);
spin_unlock_irqrestore(>ring_lock, flags);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index b0b2bdc750df..bda6a2f37dc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, 
uint32_t reg);
  void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t 
v);
  void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t reg0, uint32_t rreg1,
-   uint32_t ref, uint32_t mask);
+   uint32_t ref, uint32_t mask,
+   uint32_t sem);
  int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool init);
  int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, bool init);
  int amdgpu_virt_reset_gpu(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index f25cd97ba5f2..1ae59af7836a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -448,9 +448,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
!adev->in_gpu_reset) {
uint32_t req = hub->vm_inv_eng0_req + eng;
uint32_t ack = hub->vm_inv_eng0_ack + eng;
+   uint32_t sem = hub->vm_inv_eng0_sem + eng;
  
  		amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, tmp,

-   1 << vmid);
+

RE: [PATCH v2 0/3] RAS support for mmhub

2019-11-20 Thread Zhang, Hawking

Series is

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Dennis Li  
Sent: 2019年11月20日 18:49
To: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
; Zhou1, Tao ; Zhang, Hawking 
; Chen, Guchun 
Cc: Li, Dennis 
Subject: [PATCH v2 0/3] RAS support for mmhub

This set of patches is a continuation of RAS enablement patches for AMDGPU. 

1. The new struct soc15_ras_field_entry will be reused by gfx, mmhub and other 
IP blocks.
2. Refine the query function of RAS error counter for VG20, add codes to help 
user to locate which sub-block of mmhub cause error.
3. Implement the query function of RAS error counter for Mi100

v2:
1. Fix some comment issues.
2. Add IP name prefix for the local static variable and function.
3. Move the EDC_CNT registers' defintion from mmhub_9_4 header files to 
mmhub_1_0 ones for vg20.

Dennis Li (3):
  drm/amdgpu: define soc15_ras_field_entry for reuse
  drm/amdgpu: refine query function of mmhub EDC counter in vg20
  drm/amdgpu: implement querying ras error count for mmhub9.4

 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c |  34 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |   3 +
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c   | 232 
 drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c   | 253 -
 drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.h   |   2 +
 drivers/gpu/drm/amd/amdgpu/soc15.h|  12 +
 .../include/asic_reg/mmhub/mmhub_1_0_offset.h |  16 ++
 .../asic_reg/mmhub/mmhub_1_0_sh_mask.h| 122 +
 .../asic_reg/mmhub/mmhub_9_4_0_offset.h   |  53 
 .../asic_reg/mmhub/mmhub_9_4_0_sh_mask.h  | 257 --
 10 files changed, 598 insertions(+), 386 deletions(-)  delete mode 100644 
drivers/gpu/drm/amd/include/asic_reg/mmhub/mmhub_9_4_0_offset.h
 delete mode 100644 
drivers/gpu/drm/amd/include/asic_reg/mmhub/mmhub_9_4_0_sh_mask.h

--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 2/3] drm/amdgpu: refine query function of mmhub EDC counter in vg20

2019-11-20 Thread Dennis Li

Add codes to print the detail EDC info for the subblock of mmhub

v2: Move the EDC_CNT registers' defintion from mmhub_9_4 header
files to mmhub_1_0 ones. Add mmhub_v1_0_ prefix for the local
static variable and function.

Change-Id: I1d5b3df38caa8f0b437c96b78091662aaeaf264b
Signed-off-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c   | 232 
 .../include/asic_reg/mmhub/mmhub_1_0_offset.h |  16 ++
 .../asic_reg/mmhub/mmhub_1_0_sh_mask.h| 122 +
 .../asic_reg/mmhub/mmhub_9_4_0_offset.h   |  53 
 .../asic_reg/mmhub/mmhub_9_4_0_sh_mask.h  | 257 --
 5 files changed, 318 insertions(+), 362 deletions(-)
 delete mode 100644 
drivers/gpu/drm/amd/include/asic_reg/mmhub/mmhub_9_4_0_offset.h
 delete mode 100644 
drivers/gpu/drm/amd/include/asic_reg/mmhub/mmhub_9_4_0_sh_mask.h

diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
index 6965e1e6fa9e..d7575ac27038 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -27,17 +27,13 @@
 #include "mmhub/mmhub_1_0_offset.h"
 #include "mmhub/mmhub_1_0_sh_mask.h"
 #include "mmhub/mmhub_1_0_default.h"
-#include "mmhub/mmhub_9_4_0_offset.h"
 #include "vega10_enum.h"
-
+#include "soc15.h"
 #include "soc15_common.h"
 
 #define mmDAGB0_CNTL_MISC2_RV 0x008f
 #define mmDAGB0_CNTL_MISC2_RV_BASE_IDX 0
 
-#define EA_EDC_CNT_MASK 0x3
-#define EA_EDC_CNT_SHIFT 0x2
-
 u64 mmhub_v1_0_get_fb_location(struct amdgpu_device *adev)
 {
u64 base = RREG32_SOC15(MMHUB, 0, mmMC_VM_FB_LOCATION_BASE);
@@ -562,59 +558,191 @@ void mmhub_v1_0_get_clockgating(struct amdgpu_device 
*adev, u32 *flags)
*flags |= AMD_CG_SUPPORT_MC_LS;
 }
 
+static const struct soc15_ras_field_entry mmhub_v1_0_ras_fields[] = {
+   { "MMEA0_DRAMRD_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, 
mmMMEA0_EDC_CNT_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, DRAMRD_CMDMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, DRAMRD_CMDMEM_DED_COUNT),
+   },
+   { "MMEA0_DRAMWR_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, 
mmMMEA0_EDC_CNT_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, DRAMWR_CMDMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, DRAMWR_CMDMEM_DED_COUNT),
+   },
+   { "MMEA0_DRAMWR_DATAMEM", SOC15_REG_ENTRY(MMHUB, 0, 
mmMMEA0_EDC_CNT_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, DRAMWR_DATAMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, DRAMWR_DATAMEM_DED_COUNT),
+   },
+   { "MMEA0_RRET_TAGMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, RRET_TAGMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, RRET_TAGMEM_DED_COUNT),
+   },
+   { "MMEA0_WRET_TAGMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, WRET_TAGMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, WRET_TAGMEM_DED_COUNT),
+   },
+   { "MMEA0_DRAMRD_PAGEMEM", SOC15_REG_ENTRY(MMHUB, 0, 
mmMMEA0_EDC_CNT_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, DRAMRD_PAGEMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_DRAMWR_PAGEMEM", SOC15_REG_ENTRY(MMHUB, 0, 
mmMMEA0_EDC_CNT_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, DRAMWR_PAGEMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_IORD_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, IORD_CMDMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_IOWR_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, IOWR_CMDMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_IOWR_DATAMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT_VG20, IOWR_DATAMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_GMIRD_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, 
mmMMEA0_EDC_CNT2_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2_VG20, GMIRD_CMDMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2_VG20, GMIRD_CMDMEM_DED_COUNT),
+   },
+   { "MMEA0_GMIWR_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, 
mmMMEA0_EDC_CNT2_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2_VG20, GMIWR_CMDMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2_VG20, GMIWR_CMDMEM_DED_COUNT),
+   },
+   { "MMEA0_GMIWR_DATAMEM", SOC15_REG_ENTRY(MMHUB, 0, 
mmMMEA0_EDC_CNT2_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2_VG20, GMIWR_DATAMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2_VG20, GMIWR_DATAMEM_DED_COUNT),
+   },
+   { "MMEA0_GMIRD_PAGEMEM", SOC15_REG_ENTRY(MMHUB, 0, 
mmMMEA0_EDC_CNT2_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2_VG20, GMIRD_PAGEMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_GMIWR_PAGEMEM", SOC15_REG_ENTRY(MMHUB, 0, 
mmMMEA0_EDC_CNT2_VG20),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2_VG20, GMIWR_PAGEMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA1_DRAMRD_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0,

[PATCH v2 0/3] RAS support for mmhub

2019-11-20 Thread Dennis Li

This set of patches is a continuation of RAS enablement patches for AMDGPU. 

1. The new struct soc15_ras_field_entry will be reused by gfx, mmhub and other 
IP blocks.
2. Refine the query function of RAS error counter for VG20, add codes to help 
user to locate which sub-block of mmhub cause error.
3. Implement the query function of RAS error counter for Mi100

v2:
1. Fix some comment issues.
2. Add IP name prefix for the local static variable and function.
3. Move the EDC_CNT registers' defintion from mmhub_9_4 header files to 
mmhub_1_0 ones for vg20.

Dennis Li (3):
  drm/amdgpu: define soc15_ras_field_entry for reuse
  drm/amdgpu: refine query function of mmhub EDC counter in vg20
  drm/amdgpu: implement querying ras error count for mmhub9.4

 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c |  34 +--
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c |   3 +
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c   | 232 
 drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c   | 253 -
 drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.h   |   2 +
 drivers/gpu/drm/amd/amdgpu/soc15.h|  12 +
 .../include/asic_reg/mmhub/mmhub_1_0_offset.h |  16 ++
 .../asic_reg/mmhub/mmhub_1_0_sh_mask.h| 122 +
 .../asic_reg/mmhub/mmhub_9_4_0_offset.h   |  53 
 .../asic_reg/mmhub/mmhub_9_4_0_sh_mask.h  | 257 --
 10 files changed, 598 insertions(+), 386 deletions(-)
 delete mode 100644 
drivers/gpu/drm/amd/include/asic_reg/mmhub/mmhub_9_4_0_offset.h
 delete mode 100644 
drivers/gpu/drm/amd/include/asic_reg/mmhub/mmhub_9_4_0_sh_mask.h

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 1/3] drm/amdgpu: define soc15_ras_field_entry for reuse

2019-11-20 Thread Dennis Li

The struct soc15_ras_field_entry will be reused by
other IPs, such as mmhub and gc

v2: rename ras_subblock_regs to gc_ras_fields_vg20,
because the future asic maybe have a different table.

Change-Id: I6c3388a09b5fbf927ad90fcd626baa448d1681a6
Signed-off-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 34 +--
 drivers/gpu/drm/amd/amdgpu/soc15.h| 12 ++
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index c7ae685d6f74..8073fcd4720e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -131,18 +131,6 @@ MODULE_FIRMWARE("amdgpu/renoir_rlc.bin");
 #define mmTCP_CHAN_STEER_5_ARCT
0x0b0c
 #define mmTCP_CHAN_STEER_5_ARCT_BASE_IDX   
0
 
-struct ras_gfx_subblock_reg {
-   const char *name;
-   uint32_t hwip;
-   uint32_t inst;
-   uint32_t seg;
-   uint32_t reg_offset;
-   uint32_t sec_count_mask;
-   uint32_t sec_count_shift;
-   uint32_t ded_count_mask;
-   uint32_t ded_count_shift;
-};
-
 enum ta_ras_gfx_subblock {
/*CPC*/
TA_RAS_BLOCK__GFX_CPC_INDEX_START = 0,
@@ -5487,7 +5475,7 @@ static int gfx_v9_0_priv_inst_irq(struct amdgpu_device 
*adev,
 }
 
 
-static const struct ras_gfx_subblock_reg ras_subblock_regs[] = {
+static const struct soc15_ras_field_entry gc_ras_fields_vg20[] = {
{ "CPC_SCRATCH", SOC15_REG_ENTRY(GC, 0, mmCPC_EDC_SCRATCH_CNT),
  SOC15_REG_FIELD(CPC_EDC_SCRATCH_CNT, SEC_COUNT),
  SOC15_REG_FIELD(CPC_EDC_SCRATCH_CNT, DED_COUNT)
@@ -6146,29 +6134,29 @@ static int __get_ras_error_count(const struct 
soc15_reg_entry *reg,
uint32_t i;
uint32_t sec_cnt, ded_cnt;
 
-   for (i = 0; i < ARRAY_SIZE(ras_subblock_regs); i++) {
-   if(ras_subblock_regs[i].reg_offset != reg->reg_offset ||
-   ras_subblock_regs[i].seg != reg->seg ||
-   ras_subblock_regs[i].inst != reg->inst)
+   for (i = 0; i < ARRAY_SIZE(gc_ras_fields_vg20); i++) {
+   if(gc_ras_fields_vg20[i].reg_offset != reg->reg_offset ||
+   gc_ras_fields_vg20[i].seg != reg->seg ||
+   gc_ras_fields_vg20[i].inst != reg->inst)
continue;
 
sec_cnt = (value &
-   ras_subblock_regs[i].sec_count_mask) >>
-   ras_subblock_regs[i].sec_count_shift;
+   gc_ras_fields_vg20[i].sec_count_mask) >>
+   gc_ras_fields_vg20[i].sec_count_shift;
if (sec_cnt) {
DRM_INFO("GFX SubBlock %s, Instance[%d][%d], SEC %d\n",
-   ras_subblock_regs[i].name,
+   gc_ras_fields_vg20[i].name,
se_id, inst_id,
sec_cnt);
*sec_count += sec_cnt;
}
 
ded_cnt = (value &
-   ras_subblock_regs[i].ded_count_mask) >>
-   ras_subblock_regs[i].ded_count_shift;
+   gc_ras_fields_vg20[i].ded_count_mask) >>
+   gc_ras_fields_vg20[i].ded_count_shift;
if (ded_cnt) {
DRM_INFO("GFX SubBlock %s, Instance[%d][%d], DED %d\n",
-   ras_subblock_regs[i].name,
+   gc_ras_fields_vg20[i].name,
se_id, inst_id,
ded_cnt);
*ded_count += ded_cnt;
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.h 
b/drivers/gpu/drm/amd/amdgpu/soc15.h
index 9af6c6ffbfa2..344280b869c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.h
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.h
@@ -60,6 +60,18 @@ struct soc15_allowed_register_entry {
bool grbm_indexed;
 };
 
+struct soc15_ras_field_entry {
+   const char *name;
+   uint32_t hwip;
+   uint32_t inst;
+   uint32_t seg;
+   uint32_t reg_offset;
+   uint32_t sec_count_mask;
+   uint32_t sec_count_shift;
+   uint32_t ded_count_mask;
+   uint32_t ded_count_shift;
+};
+
 #define SOC15_REG_ENTRY(ip, inst, reg) ip##_HWIP, inst, reg##_BASE_IDX, reg
 
 #define SOC15_REG_ENTRY_OFFSET(entry)  
(adev->reg_offset[entry.hwip][entry.inst][entry.seg] + entry.reg_offset)
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2 3/3] drm/amdgpu: implement querying ras error count for mmhub9.4

2019-11-20 Thread Dennis Li

Get mmhub error counter by accessing EDC_CNT registers.

v2: Add mmhub_v9_4_ prefix for local static variable and function

Change-Id: I728d4183a08707aaf0fc71d184e86322a681e725
Signed-off-by: Dennis Li 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   |   3 +
 drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c | 253 +++-
 drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.h |   2 +
 3 files changed, 257 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index ee615d050837..5f4a6cdf83a7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -658,6 +658,9 @@ static void gmc_v9_0_set_mmhub_funcs(struct amdgpu_device 
*adev)
case CHIP_VEGA20:
adev->mmhub.funcs = _v1_0_funcs;
break;
+   case CHIP_ARCTURUS:
+   adev->mmhub.funcs = _v9_4_funcs;
+   break;
default:
break;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
index 2c5adfe803a2..6fe5c39e5581 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
@@ -21,6 +21,7 @@
  *
  */
 #include "amdgpu.h"
+#include "amdgpu_ras.h"
 #include "mmhub_v9_4.h"
 
 #include "mmhub/mmhub_9_4_1_offset.h"
@@ -29,7 +30,7 @@
 #include "athub/athub_1_0_offset.h"
 #include "athub/athub_1_0_sh_mask.h"
 #include "vega10_enum.h"
-
+#include "soc15.h"
 #include "soc15_common.h"
 
 #define MMHUB_NUM_INSTANCES2
@@ -651,3 +652,253 @@ void mmhub_v9_4_get_clockgating(struct amdgpu_device 
*adev, u32 *flags)
if (data & ATCL2_0_ATC_L2_MISC_CG__MEM_LS_ENABLE_MASK)
*flags |= AMD_CG_SUPPORT_MC_LS;
 }
+
+static const struct soc15_ras_field_entry mmhub_v9_4_ras_fields[] = {
+   { "MMEA0_DRAMRD_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, DRAMRD_CMDMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, DRAMRD_CMDMEM_DED_COUNT),
+   },
+   { "MMEA0_DRAMWR_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, DRAMWR_CMDMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, DRAMWR_CMDMEM_DED_COUNT),
+   },
+   { "MMEA0_DRAMWR_DATAMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, DRAMWR_DATAMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, DRAMWR_DATAMEM_DED_COUNT),
+   },
+   { "MMEA0_RRET_TAGMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, RRET_TAGMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, RRET_TAGMEM_DED_COUNT),
+   },
+   { "MMEA0_WRET_TAGMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, WRET_TAGMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, WRET_TAGMEM_DED_COUNT),
+   },
+   { "MMEA0_DRAMRD_PAGEMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, DRAMRD_PAGEMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_DRAMWR_PAGEMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, DRAMWR_PAGEMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_IORD_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, IORD_CMDMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_IOWR_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, IOWR_CMDMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_IOWR_DATAMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT, IOWR_DATAMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_GMIRD_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT2),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2, GMIRD_CMDMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2, GMIRD_CMDMEM_DED_COUNT),
+   },
+   { "MMEA0_GMIWR_CMDMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT2),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2, GMIWR_CMDMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2, GMIWR_CMDMEM_DED_COUNT),
+   },
+   { "MMEA0_GMIWR_DATAMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT2),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2, GMIWR_DATAMEM_SEC_COUNT),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2, GMIWR_DATAMEM_DED_COUNT),
+   },
+   { "MMEA0_GMIRD_PAGEMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT2),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2, GMIRD_PAGEMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_GMIWR_PAGEMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT2),
+   SOC15_REG_FIELD(MMEA0_EDC_CNT2, GMIWR_PAGEMEM_SED_COUNT),
+   0, 0,
+   },
+   { "MMEA0_DRAMRD_PAGEMEM", SOC15_REG_ENTRY(MMHUB, 0, mmMMEA0_EDC_CNT3),
+   0, 0,
+   SOC15_REG_FIELD(MMEA0_EDC_CNT3, DRAMRD_PAGEMEM_DED_COUNT),
+   },
+   { "MMEA0_DRAMWR_PAGEMEM", SOC15_REG_ENTRY(MMHUB, 0,

[PATCH 2/2] drm/amdgpu: invalidate mmhub semphore workaround in gmc9/gmc10

2019-11-20 Thread Changfeng.Zhu

From: changzhu 

It may lose gpuvm invalidate acknowldege state across power-gating off
cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore acquire
before invalidation and semaphore release after invalidation.

After adding semaphore acquire before invalidation, the semaphore
register become read-only if another process try to acquire semaphore.
Then it will not be able to release this semaphore. Then it may cause
deadlock problem. If this deadlock problem happens, it needs a semaphore
firmware fix.

Change-Id: I9942a2f451265c1f1038ccfe2f70042c7c8118af
Signed-off-by: changzhu 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 54 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 54 ++
 drivers/gpu/drm/amd/amdgpu/soc15.h |  4 +-
 3 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index af2615ba52aa..ff80a62ca514 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -234,6 +234,27 @@ static void gmc_v10_0_flush_vm_hub(struct amdgpu_device 
*adev, uint32_t vmid,
const unsigned eng = 17;
unsigned int i;
 
+   spin_lock(>gmc.invalidate_lock);
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+   if (vmhub == AMDGPU_MMHUB_0 ||
+   vmhub == AMDGPU_MMHUB_1) {
+   for (i = 0; i < adev->usec_timeout; i++) {
+   /* a read return value of 1 means semaphore acuqire */
+   tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_sem + eng);
+   if (tmp & 0x1)
+   break;
+   udelay(1);
+   }
+
+   if (i >= adev->usec_timeout)
+   DRM_ERROR("Timeout waiting for sem acquire in VM 
flush!\n");
+   }
+
WREG32_NO_KIQ(hub->vm_inv_eng0_req + eng, tmp);
 
/*
@@ -253,6 +274,16 @@ static void gmc_v10_0_flush_vm_hub(struct amdgpu_device 
*adev, uint32_t vmid,
udelay(1);
}
 
+   /*
+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   if (vmhub == AMDGPU_MMHUB_0 ||
+   vmhub == AMDGPU_MMHUB_1)
+   WREG32_NO_KIQ(hub->vm_inv_eng0_sem + eng, 0);
+
+   spin_unlock(>gmc.invalidate_lock);
+
if (i < adev->usec_timeout)
return;
 
@@ -338,6 +369,21 @@ static uint64_t gmc_v10_0_emit_flush_gpu_tlb(struct 
amdgpu_ring *ring,
uint32_t req = gmc_v10_0_get_invalidate_req(vmid, 0);
unsigned eng = ring->vm_inv_eng;
 
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1) {
+   amdgpu_ring_emit_reg_wait(ring,
+ hub->vm_inv_eng0_sem + eng, 0x1, 0x1);
+   DRM_WARN_ONCE("Adding semaphore may cause deadlock and it needs 
firmware fix\n");
+   }
+
amdgpu_ring_emit_wreg(ring, hub->ctx0_ptb_addr_lo32 + (2 * vmid),
  lower_32_bits(pd_addr));
 
@@ -348,6 +394,14 @@ static uint64_t gmc_v10_0_emit_flush_gpu_tlb(struct 
amdgpu_ring *ring,
hub->vm_inv_eng0_ack + eng,
req, 1 << vmid);
 
+   /*
+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_sem + eng, 0);
+
return pd_addr;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 1ae59af7836a..92b8e234a586 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -456,6 +456,27 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
}
 
spin_lock(>gmc.invalidate_lock);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+   if (vmhub == AMDGPU_MMHUB_0 ||
+   vmhub ==

Re: [PATCH] drm/dsc: Return unsigned long on compute offset

2019-11-20 Thread Ville Syrjälä

On Tue, Nov 19, 2019 at 04:11:43PM -0500, Mikita Lipski wrote:
> 
> 
> On 19/11/2019 16:09, Mikita Lipski wrote:
> > 
> > 
> > On 19/11/2019 12:11, Ville Syrjälä wrote:
> >> On Tue, Nov 19, 2019 at 04:59:40PM +, Cornij, Nikola wrote:
> >>> If you're going to make all of them the same, then u64, please.
> >>>
> >>> This is because I'm not sure if calculations require 64-bit at some 
> >>> stage.
> >>
> >> If it does then it's already broken. Someone should probably figure out
> >> what's actally needed instead of shooting ducks with an icbm.
> >>
> 
> 
> Sorry made a type below. Supposed to be "I don't think it is broken"

I mean that it's broken if it actually needs u64 when it's
currently using unsigned long. So u64 is either overkill or the
code is currently broken.

> 
> > I don't think it is not broken, cause I'm currently testing DSC.
> > The patch I sent early simply fixes the error of comparing  signed and 
> > unsigned variables.
> > 
> > We can then submit a second patch addressing the issue of using unsigned 
> > long int instead of u32. Also, since the variables in drm_dsc_config 
> > structure are all of type u8 and u16, the calculation values shouldn't 
> > exceed the size of u32.
> > 
> > Thanks
> > 
> >>>
> >>> -Original Message-
> >>> From: Lipski, Mikita 
> >>> Sent: November 19, 2019 10:08 AM
> >>> To: Ville Syrjälä ; Lipski, Mikita 
> >>> 
> >>> Cc: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
> >>> Cornij, Nikola 
> >>> Subject: Re: [PATCH] drm/dsc: Return unsigned long on compute offset
> >>>
> >>>
> >>>
> >>> On 19/11/2019 09:56, Ville Syrjälä wrote:
>  On Tue, Nov 19, 2019 at 09:45:26AM -0500, mikita.lip...@amd.com wrote:
> > From: Mikita Lipski 
> >
> > We shouldn't compare int with unsigned long to find the max value and
> > since we are not expecting negative value returned from
> > compute_offset we should make this function return unsigned long so
> > we can compare the values when computing rc parameters.
> 
>  Why are there other unsigned longs in dsc parameter computation in the
>  first place?
> >>>
> >>> I believe it was initially set to be unsigned long for variable 
> >>> consistency, when we ported intel_compute_rc_parameters into 
> >>> drm_dsc_compute_rc_parameters. But now that I look at it, we can 
> >>> actually just set them to u32 or u64, as nothing should exceed that.
> 
> >
> > Cc: Nikola Cornij 
> > Cc: Harry Wentland 
> > Signed-off-by: Mikita Lipski 
> > ---
> >    drivers/gpu/drm/drm_dsc.c | 6 +++---
> >    1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_dsc.c b/drivers/gpu/drm/drm_dsc.c
> > index 74f3527f567d..ec40604ab6a2 100644
> > --- a/drivers/gpu/drm/drm_dsc.c
> > +++ b/drivers/gpu/drm/drm_dsc.c
> > @@ -245,11 +245,11 @@ void drm_dsc_pps_payload_pack(struct 
> > drm_dsc_picture_parameter_set *pps_payload,
> >    }
> >    EXPORT_SYMBOL(drm_dsc_pps_payload_pack);
> > -static int compute_offset(struct drm_dsc_config *vdsc_cfg, int 
> > pixels_per_group,
> > +static unsigned long compute_offset(struct drm_dsc_config 
> > *vdsc_cfg, int pixels_per_group,
> >    int groups_per_line, int grpcnt)
> >    {
> > -    int offset = 0;
> > -    int grpcnt_id = DIV_ROUND_UP(vdsc_cfg->initial_xmit_delay, 
> > pixels_per_group);
> > +    unsigned long offset = 0;
> > +    unsigned long grpcnt_id = 
> > DIV_ROUND_UP(vdsc_cfg->initial_xmit_delay, pixels_per_group);
> >    if (grpcnt <= grpcnt_id)
> >    offset = DIV_ROUND_UP(grpcnt * pixels_per_group * 
> > vdsc_cfg->bits_per_pixel, 16);
> > -- 
> > 2.17.1
> >
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> >>>
> >>> -- 
> >>> Thanks,
> >>> Mikita Lipski
> >>> Software Engineer 2, AMD
> >>> mikita.lip...@amd.com
> >>
> > 
> 
> -- 
> Thanks,
> Mikita Lipski
> Software Engineer 2, AMD
> mikita.lip...@amd.com

-- 
Ville Syrjälä
Intel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Recall: [PATCH 2/2] drm/amdgpu: invalidate mmhub semphore workaround in gmc9/gmc10

2019-11-20 Thread Zhu, Changfeng

Zhu, Changfeng would like to recall the message, "[PATCH 2/2] drm/amdgpu: 
invalidate mmhub semphore workaround in gmc9/gmc10".
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/amdgpu: invalidate mmhub semphore workaround in gmc9/gmc10

2019-11-20 Thread Changfeng.Zhu

From: changzhu 

It may lose gpuvm invalidate acknowldege state across power-gating off
cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore acquire
before invalidation and semaphore release after invalidation.

After adding semaphore acquire before invalidation, the semaphore
register become read-only if another process try to acquire semaphore.
Then it will not be able to release this semaphore. Then it may cause
deadlock problem. If this deadlock problem happens, it needs a semaphore
firmware fix.

Change-Id: I9942a2f451265c1f1038ccfe2f70042c7c8118af
Signed-off-by: changzhu 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 49 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 49 ++
 drivers/gpu/drm/amd/amdgpu/soc15.h |  4 +--
 3 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index af2615ba52aa..685d0d5ef31e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -234,6 +234,24 @@ static void gmc_v10_0_flush_vm_hub(struct amdgpu_device 
*adev, uint32_t vmid,
const unsigned eng = 17;
unsigned int i;
 
+   spin_lock(>gmc.invalidate_lock);
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+   for (i = 0; i < adev->usec_timeout; i++) {
+   /* a read return value of 1 means semaphore acuqire */
+   tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_sem + eng);
+   if (tmp & 0x1)
+   break;
+   udelay(1);
+   }
+
+   if (i >= adev->usec_timeout)
+   DRM_ERROR("Timeout waiting for sem acquire in VM flush!\n");
+
WREG32_NO_KIQ(hub->vm_inv_eng0_req + eng, tmp);
 
/*
@@ -253,6 +271,14 @@ static void gmc_v10_0_flush_vm_hub(struct amdgpu_device 
*adev, uint32_t vmid,
udelay(1);
}
 
+   /*
+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   WREG32_NO_KIQ(hub->vm_inv_eng0_sem + eng, 0);
+
+   spin_unlock(>gmc.invalidate_lock);
+
if (i < adev->usec_timeout)
return;
 
@@ -338,6 +364,21 @@ static uint64_t gmc_v10_0_emit_flush_gpu_tlb(struct 
amdgpu_ring *ring,
uint32_t req = gmc_v10_0_get_invalidate_req(vmid, 0);
unsigned eng = ring->vm_inv_eng;
 
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1) {
+   amdgpu_ring_emit_reg_wait(ring,
+ hub->vm_inv_eng0_sem + eng, 0x1, 0x1);
+   DRM_WARN_ONCE("Adding semaphore may cause deadlock and it needs 
firmware fix\n");
+   }
+
amdgpu_ring_emit_wreg(ring, hub->ctx0_ptb_addr_lo32 + (2 * vmid),
  lower_32_bits(pd_addr));
 
@@ -348,6 +389,14 @@ static uint64_t gmc_v10_0_emit_flush_gpu_tlb(struct 
amdgpu_ring *ring,
hub->vm_inv_eng0_ack + eng,
req, 1 << vmid);
 
+   /*
+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_wreg(ring, hub->vm_inv_eng0_sem + eng, 0);
+
return pd_addr;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 1ae59af7836a..c4118cbb0fbe 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -456,6 +456,24 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
}
 
spin_lock(>gmc.invalidate_lock);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+   for (j = 0; j < adev->usec_timeout; j++) {
+   /* a read return value of 1 means semaphore acuqire */
+   tmp = RREG32_NO_KIQ(hub->vm_inv_eng0_sem + eng);
+   if (tmp & 0x1)
+   break;
+   udelay(1);
+   }
+
+   if (j >=

[PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

2019-11-20 Thread Changfeng.Zhu

From: changzhu 

It may lose gpuvm invalidate acknowldege state across power-gating off
cycle. To avoid this issue in virt invalidation, add semaphore acquire
before invalidation and semaphore release after invalidation.

Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
Signed-off-by: changzhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  3 ++-
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f04eb1a64271..70ffaf91cd12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, 
uint32_t reg, uint32_t v)
 
 void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t reg0, uint32_t reg1,
-   uint32_t ref, uint32_t mask)
+   uint32_t ref, uint32_t mask,
+   uint32_t sem)
 {
struct amdgpu_kiq *kiq = >gfx.kiq;
struct amdgpu_ring *ring = >ring;
@@ -144,9 +145,30 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct 
amdgpu_device *adev,
uint32_t seq;
 
spin_lock_irqsave(>ring_lock, flags);
-   amdgpu_ring_alloc(ring, 32);
+   amdgpu_ring_alloc(ring, 60);
+
+   /*
+* It may lose gpuvm invalidate acknowldege state across power-gating
+* off cycle, add semaphore acquire before invalidation and semaphore
+* release after invalidation to avoid entering power gated state
+* to WA the Issue
+*/
+
+   /* a read return value of 1 means semaphore acuqire */
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
+
amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
ref, mask);
+   /*
+* add semaphore release after invalidation,
+* write with 0 means semaphore release
+*/
+   if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
+   ring->funcs->vmhub == AMDGPU_MMHUB_1)
+   amdgpu_ring_emit_wreg(ring, sem, 0);
+
amdgpu_fence_emit_polling(ring, );
amdgpu_ring_commit(ring);
spin_unlock_irqrestore(>ring_lock, flags);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
index b0b2bdc750df..bda6a2f37dc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, 
uint32_t reg);
 void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t 
v);
 void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
uint32_t reg0, uint32_t rreg1,
-   uint32_t ref, uint32_t mask);
+   uint32_t ref, uint32_t mask,
+   uint32_t sem);
 int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool init);
 int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, bool init);
 int amdgpu_virt_reset_gpu(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index f25cd97ba5f2..1ae59af7836a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -448,9 +448,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
*adev, uint32_t vmid,
!adev->in_gpu_reset) {
uint32_t req = hub->vm_inv_eng0_req + eng;
uint32_t ack = hub->vm_inv_eng0_ack + eng;
+   uint32_t sem = hub->vm_inv_eng0_sem + eng;
 
amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, tmp,
-   1 << vmid);
+  1 << vmid, sem);
return;
}
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Yuan, Xiaojie

Got it.
Thanks Hawking and Chris for your attention.

BR,
Xiaojie


From: Zhang, Hawking 
Sent: Wednesday, November 20, 2019 5:04 PM
To: Yuan, Xiaojie; Koenig, Christian; amd-gfx@lists.freedesktop.org
Cc: Long, Gang; Xiao, Jack; Ma, Le
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

I'm okay with the re-initialize and it's more straightforward approach.

Regards,
Hawking
-Original Message-
From: Yuan, Xiaojie 
Sent: 2019年11月20日 17:00
To: Zhang, Hawking ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

I can try this approach as well.
This csb is similar to the kiq mqd which are allocated in vram and might 
corrupt after baco reset.

BR,
Xiaojie


From: Zhang, Hawking 
Sent: Wednesday, November 20, 2019 4:54 PM
To: Koenig, Christian; Yuan, Xiaojie; amd-gfx@lists.freedesktop.org
Cc: Long, Gang; Xiao, Jack
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

I think we should evict the bo and then move it back.

Regards,
Hawking

-Original Message-
From: Christian König 
Sent: 2019年11月20日 16:47
To: Zhang, Hawking ; Yuan, Xiaojie 
; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

A baco reset also resets the MC, doesn't it? n this case it would be expected 
that the content of VRAM is corrupted.

Christian.

Am 20.11.19 um 09:45 schrieb Zhang, Hawking:
> Or in another word, we are still not clear when the corruption actually 
> happens, right?
>
> Regards,
> Hawking
> -Original Message-
> From: amd-gfx  On Behalf Of
> Zhang, Hawking
> Sent: 2019年11月20日 16:44
> To: Yuan, Xiaojie ;
> amd-gfx@lists.freedesktop.org
> Cc: Long, Gang ; Xiao, Jack 
> Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer
> after gpu reset
>
> Just make sure I understand you correctly. So until fw team root cause the 
> reason of csb corruption, we keep the workaround in driver, correct?
>
> Regards,
> Hawking
> -Original Message-
> From: Yuan, Xiaojie 
> Sent: 2019年11月20日 14:47
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Xiao, Jack
> ; Long, Gang ; Yuan, Xiaojie
> 
> Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after
> gpu reset
>
> This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.
>
> clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
> gfxoff exit, CPF gets garbage header in CSIB and hangs.
>
> Signed-off-by: Xiaojie Yuan 
> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
>   1 file changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 9274bd4b6c68..8e24ea08ca39 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -1789,27 +1789,52 @@ static void 
> gfx_v10_0_enable_gui_idle_interrupt(struct amdgpu_device *adev,
>   WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }
>
> -static void gfx_v10_0_init_csb(struct amdgpu_device *adev)
> +static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
>   {
> + int r;
> +
> + if (adev->in_gpu_reset) {
> + r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
> + if (r)
> + return r;
> +
> + r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
> +(void **)>gfx.rlc.cs_ptr);
> + if (!r) {
> + adev->gfx.rlc.funcs->get_csb_buffer(adev,
> + adev->gfx.rlc.cs_ptr);
> + amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
> + }
> +
> + amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
> + if (r)
> + return r;
> + }
> +
>   /* csib */
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
>adev->gfx.rlc.clear_state_gpu_addr >> 32);
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO,
>adev->gfx.rlc.clear_state_gpu_addr & 0xfffc);
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH,
> adev->gfx.rlc.clear_state_size);
> +
> + return 0;
>   }
>
> -static void gfx_v10_0_init_pg(struct amdgpu_device *adev)
> +static int gfx_v10_0_init_pg(struct amdgpu_device *adev)
>   {
>   int i;
> + int r;
>
> - gfx_v10_0_init_csb(adev);
> + r = gfx_v10_0_init_csb(adev);
> + if (r)
> + return r;
>
>   for (i = 0; i < adev->num_vmhubs; i++)
>   amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);
>
>   /* TODO: init power gating */
> - return;
> + return 0;
>   }
>
>   void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ 
>

RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Zhang, Hawking

I'm okay with the re-initialize and it's more straightforward approach. 

Regards,
Hawking
-Original Message-
From: Yuan, Xiaojie  
Sent: 2019年11月20日 17:00
To: Zhang, Hawking ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

I can try this approach as well.
This csb is similar to the kiq mqd which are allocated in vram and might 
corrupt after baco reset.

BR,
Xiaojie


From: Zhang, Hawking 
Sent: Wednesday, November 20, 2019 4:54 PM
To: Koenig, Christian; Yuan, Xiaojie; amd-gfx@lists.freedesktop.org
Cc: Long, Gang; Xiao, Jack
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

I think we should evict the bo and then move it back.

Regards,
Hawking

-Original Message-
From: Christian König 
Sent: 2019年11月20日 16:47
To: Zhang, Hawking ; Yuan, Xiaojie 
; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

A baco reset also resets the MC, doesn't it? n this case it would be expected 
that the content of VRAM is corrupted.

Christian.

Am 20.11.19 um 09:45 schrieb Zhang, Hawking:
> Or in another word, we are still not clear when the corruption actually 
> happens, right?
>
> Regards,
> Hawking
> -Original Message-
> From: amd-gfx  On Behalf Of 
> Zhang, Hawking
> Sent: 2019年11月20日 16:44
> To: Yuan, Xiaojie ; 
> amd-gfx@lists.freedesktop.org
> Cc: Long, Gang ; Xiao, Jack 
> Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer 
> after gpu reset
>
> Just make sure I understand you correctly. So until fw team root cause the 
> reason of csb corruption, we keep the workaround in driver, correct?
>
> Regards,
> Hawking
> -Original Message-
> From: Yuan, Xiaojie 
> Sent: 2019年11月20日 14:47
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Xiao, Jack 
> ; Long, Gang ; Yuan, Xiaojie 
> 
> Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after 
> gpu reset
>
> This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.
>
> clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
> gfxoff exit, CPF gets garbage header in CSIB and hangs.
>
> Signed-off-by: Xiaojie Yuan 
> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
>   1 file changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 9274bd4b6c68..8e24ea08ca39 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -1789,27 +1789,52 @@ static void 
> gfx_v10_0_enable_gui_idle_interrupt(struct amdgpu_device *adev,
>   WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }
>
> -static void gfx_v10_0_init_csb(struct amdgpu_device *adev)
> +static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
>   {
> + int r;
> +
> + if (adev->in_gpu_reset) {
> + r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
> + if (r)
> + return r;
> +
> + r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
> +(void **)>gfx.rlc.cs_ptr);
> + if (!r) {
> + adev->gfx.rlc.funcs->get_csb_buffer(adev,
> + adev->gfx.rlc.cs_ptr);
> + amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
> + }
> +
> + amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
> + if (r)
> + return r;
> + }
> +
>   /* csib */
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
>adev->gfx.rlc.clear_state_gpu_addr >> 32);
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO,
>adev->gfx.rlc.clear_state_gpu_addr & 0xfffc);
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH,
> adev->gfx.rlc.clear_state_size);
> +
> + return 0;
>   }
>
> -static void gfx_v10_0_init_pg(struct amdgpu_device *adev)
> +static int gfx_v10_0_init_pg(struct amdgpu_device *adev)
>   {
>   int i;
> + int r;
>
> - gfx_v10_0_init_csb(adev);
> + r = gfx_v10_0_init_csb(adev);
> + if (r)
> + return r;
>
>   for (i = 0; i < adev->num_vmhubs; i++)
>   amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);
>
>   /* TODO: init power gating */
> - return;
> + return 0;
>   }
>
>   void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ 
> static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)
>   r = gfx_v10_0_wait_for_rlc_autoload_complete(adev);
>   if (r)
>   return r;
> - gfx_v10_0_init_pg(adev);
> +
> + r = gfx_v10_0_init_pg(adev);
> + if (r)
> + return r;
>
>   /*

RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Zhang, Hawking

I see. Thanks for the clarifying. The patch is

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: Yuan, Xiaojie  
Sent: 2019年11月20日 16:56
To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack ; Long, Gang 
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

Hi Hawking,

This is a formal fix I think.
As you suggested, I tried to dump csb in various locations in gpu reset 
sequence:
1. after gfx_v10_0_hw_fini() completes - csb still contains correct data
|
v
   baco reset
|
v
2. right after gmc resume - csb contains corrupted data

In this small time window, no firmware touches csb and it should prove that 
vram lost causes csb corrupt.

BR,
Xiaojie


From: Zhang, Hawking 
Sent: Wednesday, November 20, 2019 4:43 PM
To: Yuan, Xiaojie; amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack; Long, Gang
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

Just make sure I understand you correctly. So until fw team root cause the 
reason of csb corruption, we keep the workaround in driver, correct?

Regards,
Hawking
-Original Message-
From: Yuan, Xiaojie 
Sent: 2019年11月20日 14:47
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Xiao, Jack ; 
Long, Gang ; Yuan, Xiaojie 
Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.

clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
gfxoff exit, CPF gets garbage header in CSIB and hangs.

Signed-off-by: Xiaojie Yuan 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 9274bd4b6c68..8e24ea08ca39 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -1789,27 +1789,52 @@ static void gfx_v10_0_enable_gui_idle_interrupt(struct 
amdgpu_device *adev,
WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }

-static void gfx_v10_0_init_csb(struct amdgpu_device *adev)
+static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
 {
+   int r;
+
+   if (adev->in_gpu_reset) {
+   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
+   if (r)
+   return r;
+
+   r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
+  (void **)>gfx.rlc.cs_ptr);
+   if (!r) {
+   adev->gfx.rlc.funcs->get_csb_buffer(adev,
+   adev->gfx.rlc.cs_ptr);
+   amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
+   }
+
+   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
+   if (r)
+   return r;
+   }
+
/* csib */
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
 adev->gfx.rlc.clear_state_gpu_addr >> 32);
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO,
 adev->gfx.rlc.clear_state_gpu_addr & 0xfffc);
WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH, adev->gfx.rlc.clear_state_size);
+
+   return 0;
 }

-static void gfx_v10_0_init_pg(struct amdgpu_device *adev)
+static int gfx_v10_0_init_pg(struct amdgpu_device *adev)
 {
int i;
+   int r;

-   gfx_v10_0_init_csb(adev);
+   r = gfx_v10_0_init_csb(adev);
+   if (r)
+   return r;

for (i = 0; i < adev->num_vmhubs; i++)
amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);

/* TODO: init power gating */
-   return;
+   return 0;
 }

 void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ 
static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)
r = gfx_v10_0_wait_for_rlc_autoload_complete(adev);
if (r)
return r;
-   gfx_v10_0_init_pg(adev);
+
+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;

/* enable RLC SRM */
gfx_v10_0_rlc_enable_srm(adev); @@ -1937,7 +1965,10 @@ static 
int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)
return r;
}

-   gfx_v10_0_init_pg(adev);
+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;
+
adev->gfx.rlc.funcs->start(adev);

if (adev->firmware.load_type == 
AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO) {
--
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Yuan, Xiaojie

I can try this approach as well.
This csb is similar to the kiq mqd which are allocated in vram and might 
corrupt after baco reset.

BR,
Xiaojie


From: Zhang, Hawking 
Sent: Wednesday, November 20, 2019 4:54 PM
To: Koenig, Christian; Yuan, Xiaojie; amd-gfx@lists.freedesktop.org
Cc: Long, Gang; Xiao, Jack
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

I think we should evict the bo and then move it back.

Regards,
Hawking

-Original Message-
From: Christian König 
Sent: 2019年11月20日 16:47
To: Zhang, Hawking ; Yuan, Xiaojie 
; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

A baco reset also resets the MC, doesn't it? n this case it would be expected 
that the content of VRAM is corrupted.

Christian.

Am 20.11.19 um 09:45 schrieb Zhang, Hawking:
> Or in another word, we are still not clear when the corruption actually 
> happens, right?
>
> Regards,
> Hawking
> -Original Message-
> From: amd-gfx  On Behalf Of
> Zhang, Hawking
> Sent: 2019年11月20日 16:44
> To: Yuan, Xiaojie ;
> amd-gfx@lists.freedesktop.org
> Cc: Long, Gang ; Xiao, Jack 
> Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer
> after gpu reset
>
> Just make sure I understand you correctly. So until fw team root cause the 
> reason of csb corruption, we keep the workaround in driver, correct?
>
> Regards,
> Hawking
> -Original Message-
> From: Yuan, Xiaojie 
> Sent: 2019年11月20日 14:47
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Xiao, Jack
> ; Long, Gang ; Yuan, Xiaojie
> 
> Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after
> gpu reset
>
> This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.
>
> clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
> gfxoff exit, CPF gets garbage header in CSIB and hangs.
>
> Signed-off-by: Xiaojie Yuan 
> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
>   1 file changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 9274bd4b6c68..8e24ea08ca39 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -1789,27 +1789,52 @@ static void 
> gfx_v10_0_enable_gui_idle_interrupt(struct amdgpu_device *adev,
>   WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }
>
> -static void gfx_v10_0_init_csb(struct amdgpu_device *adev)
> +static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
>   {
> + int r;
> +
> + if (adev->in_gpu_reset) {
> + r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
> + if (r)
> + return r;
> +
> + r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
> +(void **)>gfx.rlc.cs_ptr);
> + if (!r) {
> + adev->gfx.rlc.funcs->get_csb_buffer(adev,
> + adev->gfx.rlc.cs_ptr);
> + amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
> + }
> +
> + amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
> + if (r)
> + return r;
> + }
> +
>   /* csib */
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
>adev->gfx.rlc.clear_state_gpu_addr >> 32);
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO,
>adev->gfx.rlc.clear_state_gpu_addr & 0xfffc);
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH,
> adev->gfx.rlc.clear_state_size);
> +
> + return 0;
>   }
>
> -static void gfx_v10_0_init_pg(struct amdgpu_device *adev)
> +static int gfx_v10_0_init_pg(struct amdgpu_device *adev)
>   {
>   int i;
> + int r;
>
> - gfx_v10_0_init_csb(adev);
> + r = gfx_v10_0_init_csb(adev);
> + if (r)
> + return r;
>
>   for (i = 0; i < adev->num_vmhubs; i++)
>   amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);
>
>   /* TODO: init power gating */
> - return;
> + return 0;
>   }
>
>   void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ 
> static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)
>   r = gfx_v10_0_wait_for_rlc_autoload_complete(adev);
>   if (r)
>   return r;
> - gfx_v10_0_init_pg(adev);
> +
> + r = gfx_v10_0_init_pg(adev);
> + if (r)
> + return r;
>
>   /* enable RLC SRM */
>   gfx_v10_0_rlc_enable_srm(adev);
> @@ -1937,7 +1965,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device 
> *adev)
>   return r;
>   }
>
> - gfx_v10_0_init_pg(adev);
> + r = gfx_v10_0_init_pg(adev);
> + if (r)
> + return r;

Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Christian König

While the GPU is locked up? Not a good idea at all, we need the SDMA for 
this and that would be rather unreliable.


If the content of the BO is constant we should just re-initialize it 
after the reset.


Regards,
Christian.

Am 20.11.19 um 09:54 schrieb Zhang, Hawking:

I think we should evict the bo and then move it back.

Regards,
Hawking

-Original Message-
From: Christian König 
Sent: 2019年11月20日 16:47
To: Zhang, Hawking ; Yuan, Xiaojie 
; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

A baco reset also resets the MC, doesn't it? n this case it would be expected 
that the content of VRAM is corrupted.

Christian.

Am 20.11.19 um 09:45 schrieb Zhang, Hawking:

Or in another word, we are still not clear when the corruption actually 
happens, right?

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of
Zhang, Hawking
Sent: 2019年11月20日 16:44
To: Yuan, Xiaojie ;
amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer
after gpu reset

Just make sure I understand you correctly. So until fw team root cause the 
reason of csb corruption, we keep the workaround in driver, correct?

Regards,
Hawking
-Original Message-
From: Yuan, Xiaojie 
Sent: 2019年11月20日 14:47
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Xiao, Jack
; Long, Gang ; Yuan, Xiaojie

Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after
gpu reset

This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.

clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
gfxoff exit, CPF gets garbage header in CSIB and hangs.

Signed-off-by: Xiaojie Yuan 
---
   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
   1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 9274bd4b6c68..8e24ea08ca39 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -1789,27 +1789,52 @@ static void gfx_v10_0_enable_gui_idle_interrupt(struct 
amdgpu_device *adev,
WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }
   
-static void gfx_v10_0_init_csb(struct amdgpu_device *adev)

+static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
   {
+   int r;
+
+   if (adev->in_gpu_reset) {
+   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
+   if (r)
+   return r;
+
+   r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
+  (void **)>gfx.rlc.cs_ptr);
+   if (!r) {
+   adev->gfx.rlc.funcs->get_csb_buffer(adev,
+   adev->gfx.rlc.cs_ptr);
+   amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
+   }
+
+   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
+   if (r)
+   return r;
+   }
+
/* csib */
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
 adev->gfx.rlc.clear_state_gpu_addr >> 32);
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO,
 adev->gfx.rlc.clear_state_gpu_addr & 0xfffc);
WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH,
adev->gfx.rlc.clear_state_size);
+
+   return 0;
   }
   
-static void gfx_v10_0_init_pg(struct amdgpu_device *adev)

+static int gfx_v10_0_init_pg(struct amdgpu_device *adev)
   {
int i;
+   int r;
   
-	gfx_v10_0_init_csb(adev);

+   r = gfx_v10_0_init_csb(adev);
+   if (r)
+   return r;
   
   	for (i = 0; i < adev->num_vmhubs; i++)

amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);
   
   	/* TODO: init power gating */

-   return;
+   return 0;
   }
   
   void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)

r = gfx_v10_0_wait_for_rlc_autoload_complete(adev);
if (r)
return r;
-   gfx_v10_0_init_pg(adev);
+
+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;
   
   		/* enable RLC SRM */

gfx_v10_0_rlc_enable_srm(adev);
@@ -1937,7 +1965,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device 
*adev)
return r;
}
   
-		gfx_v10_0_init_pg(adev);

+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;
+
adev->gfx.rlc.funcs->start(adev);
   
   		if (adev->firmware.load_type == AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO)

{
--
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org

Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Yuan, Xiaojie

Hi Hawking,

This is a formal fix I think.
As you suggested, I tried to dump csb in various locations in gpu reset 
sequence:
1. after gfx_v10_0_hw_fini() completes - csb still contains correct data
|
v
   baco reset
|
v
2. right after gmc resume - csb contains corrupted data

In this small time window, no firmware touches csb and it should prove that 
vram lost causes csb corrupt.

BR,
Xiaojie


From: Zhang, Hawking 
Sent: Wednesday, November 20, 2019 4:43 PM
To: Yuan, Xiaojie; amd-gfx@lists.freedesktop.org
Cc: Xiao, Jack; Long, Gang
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

Just make sure I understand you correctly. So until fw team root cause the 
reason of csb corruption, we keep the workaround in driver, correct?

Regards,
Hawking
-Original Message-
From: Yuan, Xiaojie 
Sent: 2019年11月20日 14:47
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Xiao, Jack ; 
Long, Gang ; Yuan, Xiaojie 
Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.

clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
gfxoff exit, CPF gets garbage header in CSIB and hangs.

Signed-off-by: Xiaojie Yuan 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 9274bd4b6c68..8e24ea08ca39 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -1789,27 +1789,52 @@ static void gfx_v10_0_enable_gui_idle_interrupt(struct 
amdgpu_device *adev,
WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }

-static void gfx_v10_0_init_csb(struct amdgpu_device *adev)
+static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
 {
+   int r;
+
+   if (adev->in_gpu_reset) {
+   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
+   if (r)
+   return r;
+
+   r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
+  (void **)>gfx.rlc.cs_ptr);
+   if (!r) {
+   adev->gfx.rlc.funcs->get_csb_buffer(adev,
+   adev->gfx.rlc.cs_ptr);
+   amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
+   }
+
+   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
+   if (r)
+   return r;
+   }
+
/* csib */
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
 adev->gfx.rlc.clear_state_gpu_addr >> 32);
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO,
 adev->gfx.rlc.clear_state_gpu_addr & 0xfffc);
WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH, adev->gfx.rlc.clear_state_size);
+
+   return 0;
 }

-static void gfx_v10_0_init_pg(struct amdgpu_device *adev)
+static int gfx_v10_0_init_pg(struct amdgpu_device *adev)
 {
int i;
+   int r;

-   gfx_v10_0_init_csb(adev);
+   r = gfx_v10_0_init_csb(adev);
+   if (r)
+   return r;

for (i = 0; i < adev->num_vmhubs; i++)
amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);

/* TODO: init power gating */
-   return;
+   return 0;
 }

 void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ 
static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)
r = gfx_v10_0_wait_for_rlc_autoload_complete(adev);
if (r)
return r;
-   gfx_v10_0_init_pg(adev);
+
+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;

/* enable RLC SRM */
gfx_v10_0_rlc_enable_srm(adev);
@@ -1937,7 +1965,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device 
*adev)
return r;
}

-   gfx_v10_0_init_pg(adev);
+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;
+
adev->gfx.rlc.funcs->start(adev);

if (adev->firmware.load_type == 
AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO) {
--
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Zhang, Hawking

I think we should evict the bo and then move it back.

Regards,
Hawking

-Original Message-
From: Christian König  
Sent: 2019年11月20日 16:47
To: Zhang, Hawking ; Yuan, Xiaojie 
; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

A baco reset also resets the MC, doesn't it? n this case it would be expected 
that the content of VRAM is corrupted.

Christian.

Am 20.11.19 um 09:45 schrieb Zhang, Hawking:
> Or in another word, we are still not clear when the corruption actually 
> happens, right?
>
> Regards,
> Hawking
> -Original Message-
> From: amd-gfx  On Behalf Of 
> Zhang, Hawking
> Sent: 2019年11月20日 16:44
> To: Yuan, Xiaojie ; 
> amd-gfx@lists.freedesktop.org
> Cc: Long, Gang ; Xiao, Jack 
> Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer 
> after gpu reset
>
> Just make sure I understand you correctly. So until fw team root cause the 
> reason of csb corruption, we keep the workaround in driver, correct?
>
> Regards,
> Hawking
> -Original Message-
> From: Yuan, Xiaojie 
> Sent: 2019年11月20日 14:47
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking ; Xiao, Jack 
> ; Long, Gang ; Yuan, Xiaojie 
> 
> Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after 
> gpu reset
>
> This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.
>
> clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
> gfxoff exit, CPF gets garbage header in CSIB and hangs.
>
> Signed-off-by: Xiaojie Yuan 
> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
>   1 file changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 9274bd4b6c68..8e24ea08ca39 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -1789,27 +1789,52 @@ static void 
> gfx_v10_0_enable_gui_idle_interrupt(struct amdgpu_device *adev,
>   WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }
>   
> -static void gfx_v10_0_init_csb(struct amdgpu_device *adev)
> +static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
>   {
> + int r;
> +
> + if (adev->in_gpu_reset) {
> + r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
> + if (r)
> + return r;
> +
> + r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
> +(void **)>gfx.rlc.cs_ptr);
> + if (!r) {
> + adev->gfx.rlc.funcs->get_csb_buffer(adev,
> + adev->gfx.rlc.cs_ptr);
> + amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
> + }
> +
> + amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
> + if (r)
> + return r;
> + }
> +
>   /* csib */
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
>adev->gfx.rlc.clear_state_gpu_addr >> 32);
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO,
>adev->gfx.rlc.clear_state_gpu_addr & 0xfffc);
>   WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH, 
> adev->gfx.rlc.clear_state_size);
> +
> + return 0;
>   }
>   
> -static void gfx_v10_0_init_pg(struct amdgpu_device *adev)
> +static int gfx_v10_0_init_pg(struct amdgpu_device *adev)
>   {
>   int i;
> + int r;
>   
> - gfx_v10_0_init_csb(adev);
> + r = gfx_v10_0_init_csb(adev);
> + if (r)
> + return r;
>   
>   for (i = 0; i < adev->num_vmhubs; i++)
>   amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);
>   
>   /* TODO: init power gating */
> - return;
> + return 0;
>   }
>   
>   void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ 
> static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)
>   r = gfx_v10_0_wait_for_rlc_autoload_complete(adev);
>   if (r)
>   return r;
> - gfx_v10_0_init_pg(adev);
> +
> + r = gfx_v10_0_init_pg(adev);
> + if (r)
> + return r;
>   
>   /* enable RLC SRM */
>   gfx_v10_0_rlc_enable_srm(adev);
> @@ -1937,7 +1965,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device 
> *adev)
>   return r;
>   }
>   
> - gfx_v10_0_init_pg(adev);
> + r = gfx_v10_0_init_pg(adev);
> + if (r)
> + return r;
> +
>   adev->gfx.rlc.funcs->start(adev);
>   
>   if (adev->firmware.load_type == 
> AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO) 
> {
> --
> 2.20.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list

Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Christian König

A baco reset also resets the MC, doesn't it? n this case it would be 
expected that the content of VRAM is corrupted.


Christian.

Am 20.11.19 um 09:45 schrieb Zhang, Hawking:

Or in another word, we are still not clear when the corruption actually 
happens, right?

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: 2019年11月20日 16:44
To: Yuan, Xiaojie ; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

Just make sure I understand you correctly. So until fw team root cause the 
reason of csb corruption, we keep the workaround in driver, correct?

Regards,
Hawking
-Original Message-
From: Yuan, Xiaojie 
Sent: 2019年11月20日 14:47
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Xiao, Jack ; Long, Gang 
; Yuan, Xiaojie 
Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.

clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
gfxoff exit, CPF gets garbage header in CSIB and hangs.

Signed-off-by: Xiaojie Yuan 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
  1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 9274bd4b6c68..8e24ea08ca39 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -1789,27 +1789,52 @@ static void gfx_v10_0_enable_gui_idle_interrupt(struct 
amdgpu_device *adev,
WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }
  
-static void gfx_v10_0_init_csb(struct amdgpu_device *adev)

+static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
  {
+   int r;
+
+   if (adev->in_gpu_reset) {
+   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
+   if (r)
+   return r;
+
+   r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
+  (void **)>gfx.rlc.cs_ptr);
+   if (!r) {
+   adev->gfx.rlc.funcs->get_csb_buffer(adev,
+   adev->gfx.rlc.cs_ptr);
+   amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
+   }
+
+   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
+   if (r)
+   return r;
+   }
+
/* csib */
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
 adev->gfx.rlc.clear_state_gpu_addr >> 32);
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO,
 adev->gfx.rlc.clear_state_gpu_addr & 0xfffc);
WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH, adev->gfx.rlc.clear_state_size);
+
+   return 0;
  }
  
-static void gfx_v10_0_init_pg(struct amdgpu_device *adev)

+static int gfx_v10_0_init_pg(struct amdgpu_device *adev)
  {
int i;
+   int r;
  
-	gfx_v10_0_init_csb(adev);

+   r = gfx_v10_0_init_csb(adev);
+   if (r)
+   return r;
  
  	for (i = 0; i < adev->num_vmhubs; i++)

amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);
  
  	/* TODO: init power gating */

-   return;
+   return 0;
  }
  
  void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)

r = gfx_v10_0_wait_for_rlc_autoload_complete(adev);
if (r)
return r;
-   gfx_v10_0_init_pg(adev);
+
+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;
  
  		/* enable RLC SRM */

gfx_v10_0_rlc_enable_srm(adev);
@@ -1937,7 +1965,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device 
*adev)
return r;
}
  
-		gfx_v10_0_init_pg(adev);

+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;
+
adev->gfx.rlc.funcs->start(adev);
  
  		if (adev->firmware.load_type == AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO) {

--
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Zhang, Hawking

Or in another word, we are still not clear when the corruption actually 
happens, right?

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: 2019年11月20日 16:44
To: Yuan, Xiaojie ; amd-gfx@lists.freedesktop.org
Cc: Long, Gang ; Xiao, Jack 
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu 
reset

Just make sure I understand you correctly. So until fw team root cause the 
reason of csb corruption, we keep the workaround in driver, correct?

Regards,
Hawking
-Original Message-
From: Yuan, Xiaojie  
Sent: 2019年11月20日 14:47
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Xiao, Jack ; 
Long, Gang ; Yuan, Xiaojie 
Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.

clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
gfxoff exit, CPF gets garbage header in CSIB and hangs.

Signed-off-by: Xiaojie Yuan 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 9274bd4b6c68..8e24ea08ca39 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -1789,27 +1789,52 @@ static void gfx_v10_0_enable_gui_idle_interrupt(struct 
amdgpu_device *adev,
WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }
 
-static void gfx_v10_0_init_csb(struct amdgpu_device *adev)
+static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
 {
+   int r;
+
+   if (adev->in_gpu_reset) {
+   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
+   if (r)
+   return r;
+
+   r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
+  (void **)>gfx.rlc.cs_ptr);
+   if (!r) {
+   adev->gfx.rlc.funcs->get_csb_buffer(adev,
+   adev->gfx.rlc.cs_ptr);
+   amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
+   }
+
+   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
+   if (r)
+   return r;
+   }
+
/* csib */
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
 adev->gfx.rlc.clear_state_gpu_addr >> 32);
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO,
 adev->gfx.rlc.clear_state_gpu_addr & 0xfffc);
WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH, adev->gfx.rlc.clear_state_size);
+
+   return 0;
 }
 
-static void gfx_v10_0_init_pg(struct amdgpu_device *adev)
+static int gfx_v10_0_init_pg(struct amdgpu_device *adev)
 {
int i;
+   int r;
 
-   gfx_v10_0_init_csb(adev);
+   r = gfx_v10_0_init_csb(adev);
+   if (r)
+   return r;
 
for (i = 0; i < adev->num_vmhubs; i++)
amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);
 
/* TODO: init power gating */
-   return;
+   return 0;
 }
 
 void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ 
static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)
r = gfx_v10_0_wait_for_rlc_autoload_complete(adev);
if (r)
return r;
-   gfx_v10_0_init_pg(adev);
+
+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;
 
/* enable RLC SRM */
gfx_v10_0_rlc_enable_srm(adev);
@@ -1937,7 +1965,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device 
*adev)
return r;
}
 
-   gfx_v10_0_init_pg(adev);
+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;
+
adev->gfx.rlc.funcs->start(adev);
 
if (adev->firmware.load_type == 
AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO) {
--
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

2019-11-20 Thread Zhang, Hawking

Just make sure I understand you correctly. So until fw team root cause the 
reason of csb corruption, we keep the workaround in driver, correct?

Regards,
Hawking
-Original Message-
From: Yuan, Xiaojie  
Sent: 2019年11月20日 14:47
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Xiao, Jack ; 
Long, Gang ; Yuan, Xiaojie 
Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.

clear state buffer (resides in vram) is corrupted after 1st baco reset, upon 
gfxoff exit, CPF gets garbage header in CSIB and hangs.

Signed-off-by: Xiaojie Yuan 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 9274bd4b6c68..8e24ea08ca39 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -1789,27 +1789,52 @@ static void gfx_v10_0_enable_gui_idle_interrupt(struct 
amdgpu_device *adev,
WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }
 
-static void gfx_v10_0_init_csb(struct amdgpu_device *adev)
+static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
 {
+   int r;
+
+   if (adev->in_gpu_reset) {
+   r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
+   if (r)
+   return r;
+
+   r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
+  (void **)>gfx.rlc.cs_ptr);
+   if (!r) {
+   adev->gfx.rlc.funcs->get_csb_buffer(adev,
+   adev->gfx.rlc.cs_ptr);
+   amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
+   }
+
+   amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
+   if (r)
+   return r;
+   }
+
/* csib */
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
 adev->gfx.rlc.clear_state_gpu_addr >> 32);
WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO,
 adev->gfx.rlc.clear_state_gpu_addr & 0xfffc);
WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH, adev->gfx.rlc.clear_state_size);
+
+   return 0;
 }
 
-static void gfx_v10_0_init_pg(struct amdgpu_device *adev)
+static int gfx_v10_0_init_pg(struct amdgpu_device *adev)
 {
int i;
+   int r;
 
-   gfx_v10_0_init_csb(adev);
+   r = gfx_v10_0_init_csb(adev);
+   if (r)
+   return r;
 
for (i = 0; i < adev->num_vmhubs; i++)
amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);
 
/* TODO: init power gating */
-   return;
+   return 0;
 }
 
 void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ 
static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)
r = gfx_v10_0_wait_for_rlc_autoload_complete(adev);
if (r)
return r;
-   gfx_v10_0_init_pg(adev);
+
+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;
 
/* enable RLC SRM */
gfx_v10_0_rlc_enable_srm(adev);
@@ -1937,7 +1965,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device 
*adev)
return r;
}
 
-   gfx_v10_0_init_pg(adev);
+   r = gfx_v10_0_init_pg(adev);
+   if (r)
+   return r;
+
adev->gfx.rlc.funcs->start(adev);
 
if (adev->firmware.load_type == 
AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO) {
--
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Potential BUG: [PATCH 2/8] drm/amdgpu: add a generic fb accessing helper function(v3)

2019-11-20 Thread Christian König


Hi Luben & Iago,

no the code is completely correct and intentional like this.

See the code path which takes the lock in amdgpu_mm_wreg() is a 
workaround when the rmmio_size doesn't allow access to the full register 
BAR.


In this case the MM_INDEX/MM_DATA registers are used as side path and 
because of this we need the lock.


But this case can't happen when you use the MM_INDEX/MM_DATA pair 
directly as parameters for the function.


Regards,
Christian.

Am 20.11.19 um 01:23 schrieb Luben Tuikov:

Hi Iago,

Thank you for finding and reporting this potential double lock.

Yes indeed, I see it--it can indeed happen.

Now, since the primitives used--macros using "amdgpu_mm_(r|w)reg\(.*\)"--in
"amdgpu_device_vram_access()" do use their own register-access spinlocks,
it maybe wise to remove the spinlock take/release in 
"amdgpu_device_vram_access()".

We'll look into it and possibly submit another patch.

Thanks again.

Regards,
Luben

On 2019-11-16 11:21 a.m., Iago Abal wrote:

Hi,

With the help of a static bug finder (EBA - https://github.com/IagoAbal/eba) I 
have found a potential double lock in Linux Next tag next-20191115, file 
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c.

This bug seems to be introduced by commit
e35e2b117f4 ("drm/amdgpu: add a generic fb accessing helper function(v3)").

The steps to reproduce it would be:

1. Start in function `amdgpu_device_vram_access`.
2. Enter for-loop `for (last += pos; pos <= last; pos += 4)`.
3. First lock: `spin_lock_irqsave(>mmio_idx_lock, flags)`.
4. Call to `WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)pos) | 0x8000)`.
    5. Note `#define WREG32_NO_KIQ(reg, v) amdgpu_mm_wreg(adev, (reg), (v), 
AMDGPU_REGS_NO_KIQ)`.
    6. Continue in function `amdgpu_mm_wreg`.
    7. Take else-branch in the third if-statement.
    8. Double lock: `spin_lock_irqsave(>mmio_idx_lock, flags)`.

I think the control flow could reach that second lock, but you may know better.

Hope it helps!

-- iago

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu/gfx10: fix out-of-bound mqd_backup array access

2019-11-20 Thread Zhang, Hawking

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Xiaojie Yuan
Sent: 2019年11月20日 14:48
To: amd-gfx@lists.freedesktop.org
Cc: Yuan, Xiaojie 
Subject: [PATCH] drm/amdgpu/gfx10: fix out-of-bound mqd_backup array access

Fixes: 4990f957c845 ("drm/amdgpu/gfx10: fix mqd backup/restore for gfx rings")
Signed-off-by: Xiaojie Yuan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index a492174ef29b..52c27e49bc7b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -454,8 +454,6 @@ void amdgpu_gfx_mqd_sw_fini(struct amdgpu_device *adev)
}
 
ring = >gfx.kiq.ring;
-   if (adev->asic_type >= CHIP_NAVI10 && amdgpu_async_gfx_ring)
-   kfree(adev->gfx.me.mqd_backup[AMDGPU_MAX_GFX_RINGS]);
kfree(adev->gfx.mec.mqd_backup[AMDGPU_MAX_COMPUTE_RINGS]);
amdgpu_bo_free_kernel(>mqd_obj,
  >mqd_gpu_addr,
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/5] drm/amdgpu: expand sdma copy_buffer interface with tmz parameter

2019-11-20 Thread Christian König


Am 20.11.19 um 04:06 schrieb Liu, Aaron:


BR,
Aaron Liu


-Original Message-
From: Tuikov, Luben 
Sent: Wednesday, November 20, 2019 7:12 AM
To: Liu, Aaron ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray
; Koenig, Christian 
Subject: Re: [PATCH 1/5] drm/amdgpu: expand sdma copy_buffer interface
with tmz parameter

I wonder if we really do need yet another function argument, thus increasing
the argument list, or if the "tmz" boolean can/is already a property of the
job/command/ib/etc., and if it can indeed be had from the latter entity...?


Hi Luben,
In fact, I also thought about it. Compared to add this argument to other 
entities, perhaps it
is more clearly and simply. Another reason is that TMZ is a relatively 
independent property.


That idea came to my mind as well while reviewing this, but there is one 
killer argument why we need this:


For correct eviction of encrypted buffers we will want to do a secure 
copy to a staging area and then an unsecure copy to the final destination.


So you need a secure and an unsecure copy in the same IB (or go the 
extra hassle to emit two IBs).


Regards,
Christian.




Regards,
Luben

On 2019-11-18 12:18 a.m., Aaron Liu wrote:

This patch expands sdma copy_buffer interface with tmz parameter.

Signed-off-by: Aaron Liu 
Reviewed-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 5 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  | 4 ++--
  drivers/gpu/drm/amd/amdgpu/cik_sdma.c| 3 ++-
  drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c   | 3 ++-
  drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   | 3 ++-
  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c   | 3 ++-
  drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c   | 3 ++-
  drivers/gpu/drm/amd/amdgpu/si_dma.c  | 3 ++-
  8 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
index 761ff8b..b313465 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h
@@ -79,7 +79,8 @@ struct amdgpu_buffer_funcs {
 /* dst addr in bytes */
 uint64_t dst_offset,
 /* number of byte to transfer */
-uint32_t byte_count);
+uint32_t byte_count,
+bool tmz);

/* maximum bytes in a single operation */
uint32_tfill_max_bytes;
@@ -97,7 +98,7 @@ struct amdgpu_buffer_funcs {
 uint32_t byte_count);
  };

-#define amdgpu_emit_copy_buffer(adev, ib, s, d, b)
(adev)->mman.buffer_funcs->emit_copy_buffer((ib),  (s), (d), (b))
+#define amdgpu_emit_copy_buffer(adev, ib, s, d, b, t)
+(adev)->mman.buffer_funcs->emit_copy_buffer((ib),  (s), (d), (b),
+(t))
  #define amdgpu_emit_fill_buffer(adev, ib, s, d, b)
(adev)->mman.buffer_funcs->emit_fill_buffer((ib), (s), (d), (b))

  struct amdgpu_sdma_instance *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 339088d..c08c15e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2022,7 +2022,7 @@ static int amdgpu_map_buffer(struct

ttm_buffer_object *bo,

dst_addr = amdgpu_bo_gpu_offset(adev->gart.bo);
dst_addr += window * AMDGPU_GTT_MAX_TRANSFER_SIZE * 8;
amdgpu_emit_copy_buffer(adev, >ibs[0], src_addr,
-   dst_addr, num_bytes);
+   dst_addr, num_bytes, false);

amdgpu_ring_pad_ib(ring, >ibs[0]);
WARN_ON(job->ibs[0].length_dw > num_dw); @@ -2093,7 +2093,7

@@ int

amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset,
uint32_t cur_size_in_bytes = min(byte_count, max_bytes);

amdgpu_emit_copy_buffer(adev, >ibs[0], src_offset,
-   dst_offset, cur_size_in_bytes);
+   dst_offset, cur_size_in_bytes, false);

src_offset += cur_size_in_bytes;
dst_offset += cur_size_in_bytes;
diff --git a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
index c45304f..82cdb8f 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik_sdma.c
@@ -1313,7 +1313,8 @@ static void cik_sdma_set_irq_funcs(struct
amdgpu_device *adev)  static void cik_sdma_emit_copy_buffer(struct

amdgpu_ib *ib,

  uint64_t src_offset,
  uint64_t dst_offset,
- uint32_t byte_count)
+ uint32_t byte_count,
+ bool tmz)
  {
ib->ptr[ib->length_dw++] = SDMA_PACKET(SDMA_OPCODE_COPY,

SDMA_COPY_SUB_OPCODE_LINEAR, 0);

ib->ptr[ib->length_dw++] = byte_count; diff --git

85 matches

Mail list logo