date:20200807

[AMD Official Use Only - Internal Distribution Only]

On Fri, Aug 7, 2020 at 5:32 PM Li, Dennis  wrote:
>
> [AMD Public Use]
>
> On Fri, Aug 7, 2020 at 1:59 PM Li, Dennis  wrote:
> >
> > [AMD Public Use]
> >
> > Hi, Daniel,
> >   Thanks your review. And I also understand your concern. I guess you 
> > missed the description of this issue, so I paste it again in the below, and 
> > explain why this issue happens.
> >
> >   For example, in a XGMI system with 2 GPU devices whose device entity 
> > is named adev. And each adev has a separated reset_sem.
> >   // device init function
> >   void device_init(adev) {
> > ...
> > init_rwsem(&adev->reset_sem);
> >   ...
> >   }
> >
> >  XGMI system have two GPU devices, so system will call device_init 
> > twice. However the definition of init_rwsem macro has a limitation which 
> > use a local static lock_class_key to initialize rw_sem, which cause each 
> > adev->reset_sem share the same lock_class_key.
> >
> >  #define init_rwsem(sem)
> >  \
> >  do {   \
> >  static struct lock_class_key __key;\
> > \
> > __init_rwsem((sem), #sem, &__key);  \
> >  } while (0)
> >
> >  And when GPU hang, we should do gpu recovery for all devices in the 
> > hive. Therefore we should lock each adev->reset_sem to protect GPU from be 
> > accessed by other threads during recovery. The detailed recovery sequence 
> > as the following:
> >  // Step1: lock all GPU firstly:
> >  for each adev of GPU0 or GPU1 {
> >down_write(adev->reset_sem);
> >do_suspend(adev);
> > }
> >
> > // Step2:
> > do_hive_recovery(hive);
> >
> > // Step3: resume and unlock GPU
> > for each adev of GPU0 or GPU1 {
> >   do_resume(adev)
> >   up_write(adev->reset_sem);
> > }
> >
> > Each adev->reset_sem has the same lock_class_key, and lockdep will take 
> > them as the same rw_semaphore object. Therefore in step1, when lock GPU1, 
> > lockdep will pop the below warning.
> >
> > I have considered your proposal (using  _nest_lock() ) before. Just as 
> > you said, _nest_lock() will hide true positive recursive locking. So I gave 
> > up it in the end.
> >
> > After reviewing codes of lockdep, I found the lockdep support 
> > dynamic_key, so using separated lock_class_key has no any side effect. In 
> > fact, init_rwsem also use dynamic_key. Please see the call sequence of 
> > init_rwsem and lockdep_set_class as the below:
> >1) init_rwsem -> __init_rwsem -> lockdep_init_map;
> >2) lockdep_set_class -> lockdep_init_map;
> >
> > Finally we go back to your concern, you maybe worry this change will 
> > cause the below dead-lock can't be detected. In fact, lockdep still is able 
> > to detect the issue as circular locking dependency, but there is no warning 
> > "recursive locking " in this case.
> > Thread A: down_write(adev->reset_sem) for GPU 0 -> 
> > down_write(adev->reset_sem) for GPU 1 -> ... -> up_write(adev->reset_sem) 
> > for GPU 1 -> up_write(adev->reset_sem) for GPU 0
> > Thread B: down_write(adev->reset_sem) for GPU 1 ->
> > down_write(adev->reset_sem) for GPU 0 -> ... ->
> > up_write(adev->reset_sem) for GPU 0 -> up_write(adev->reset_sem) for 
> > GPU 1
> >
> > But lockdep still can detect recursive locking for this case:
> > Thread A: down_write(adev->reset_sem) for GPU 0 -> ...-> ...->
> > down_write(adev->reset_sem) for GPU 0
>
> Yeah, I guessed correctly what you're doing. My recommendation to use 
> down_write_nest_lock still stands. This is for reset only, kernel-wide reset 
> lock really shouldn't hurt. Or make it a lock per xgmi hive, I'm assuming 
> that information is known somewhere.
>
> The problem with manual lockdep annotations is that they increase complexity. 
> You have to keep all the annotations in mind, including justifcations and 
> which parts they still catch and which parts they don't catch. And there's 
> zero performance justification for a gpu reset path to create some fancy 
> lockdep special cases.
>
> Locking needs to be
> 1. maintainable, i.e. every time you need to write a multi-paragraph 
> explanation like the above it's probably not. This obviously includes 
> correctness, but it's even more important that people can easily understand 
> what you're doing.
> 2. fast enough, where it matters. gpu reset just doesn't.
>
> [Dennis Li] Yeah. I strongly agree that manual lockdep annotation will 
> increase complexity. However my patch isn't for lockdep annotation, and in 
> fact it is to fix a bug. According to design of lockdep, every lock object 
> should has a separated  lock_class_key which is an unified ID of lock object 
> in lockdep.

Nope that 's not how lockdep work

Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking

On Fri, Aug 7, 2020 at 5:32 PM Li, Dennis  wrote:
>
> [AMD Public Use]
>
> On Fri, Aug 7, 2020 at 1:59 PM Li, Dennis  wrote:
> >
> > [AMD Public Use]
> >
> > Hi, Daniel,
> >   Thanks your review. And I also understand your concern. I guess you 
> > missed the description of this issue, so I paste it again in the below, and 
> > explain why this issue happens.
> >
> >   For example, in a XGMI system with 2 GPU devices whose device entity 
> > is named adev. And each adev has a separated reset_sem.
> >   // device init function
> >   void device_init(adev) {
> > ...
> > init_rwsem(&adev->reset_sem);
> >   ...
> >   }
> >
> >  XGMI system have two GPU devices, so system will call device_init 
> > twice. However the definition of init_rwsem macro has a limitation which 
> > use a local static lock_class_key to initialize rw_sem, which cause each 
> > adev->reset_sem share the same lock_class_key.
> >
> >  #define init_rwsem(sem)
> >  \
> >  do {   \
> >  static struct lock_class_key __key;\
> > \
> > __init_rwsem((sem), #sem, &__key);  \
> >  } while (0)
> >
> >  And when GPU hang, we should do gpu recovery for all devices in the 
> > hive. Therefore we should lock each adev->reset_sem to protect GPU from be 
> > accessed by other threads during recovery. The detailed recovery sequence 
> > as the following:
> >  // Step1: lock all GPU firstly:
> >  for each adev of GPU0 or GPU1 {
> >down_write(adev->reset_sem);
> >do_suspend(adev);
> > }
> >
> > // Step2:
> > do_hive_recovery(hive);
> >
> > // Step3: resume and unlock GPU
> > for each adev of GPU0 or GPU1 {
> >   do_resume(adev)
> >   up_write(adev->reset_sem);
> > }
> >
> > Each adev->reset_sem has the same lock_class_key, and lockdep will take 
> > them as the same rw_semaphore object. Therefore in step1, when lock GPU1, 
> > lockdep will pop the below warning.
> >
> > I have considered your proposal (using  _nest_lock() ) before. Just as 
> > you said, _nest_lock() will hide true positive recursive locking. So I gave 
> > up it in the end.
> >
> > After reviewing codes of lockdep, I found the lockdep support 
> > dynamic_key, so using separated lock_class_key has no any side effect. In 
> > fact, init_rwsem also use dynamic_key. Please see the call sequence of 
> > init_rwsem and lockdep_set_class as the below:
> >1) init_rwsem -> __init_rwsem -> lockdep_init_map;
> >2) lockdep_set_class -> lockdep_init_map;
> >
> > Finally we go back to your concern, you maybe worry this change will 
> > cause the below dead-lock can't be detected. In fact, lockdep still is able 
> > to detect the issue as circular locking dependency, but there is no warning 
> > "recursive locking " in this case.
> > Thread A: down_write(adev->reset_sem) for GPU 0 -> 
> > down_write(adev->reset_sem) for GPU 1 -> ... -> up_write(adev->reset_sem) 
> > for GPU 1 -> up_write(adev->reset_sem) for GPU 0
> > Thread B: down_write(adev->reset_sem) for GPU 1 ->
> > down_write(adev->reset_sem) for GPU 0 -> ... ->
> > up_write(adev->reset_sem) for GPU 0 -> up_write(adev->reset_sem) for
> > GPU 1
> >
> > But lockdep still can detect recursive locking for this case:
> > Thread A: down_write(adev->reset_sem) for GPU 0 -> ...-> ...->
> > down_write(adev->reset_sem) for GPU 0
>
> Yeah, I guessed correctly what you're doing. My recommendation to use 
> down_write_nest_lock still stands. This is for reset only, kernel-wide reset 
> lock really shouldn't hurt. Or make it a lock per xgmi hive, I'm assuming 
> that information is known somewhere.
>
> The problem with manual lockdep annotations is that they increase complexity. 
> You have to keep all the annotations in mind, including justifcations and 
> which parts they still catch and which parts they don't catch. And there's 
> zero performance justification for a gpu reset path to create some fancy 
> lockdep special cases.
>
> Locking needs to be
> 1. maintainable, i.e. every time you need to write a multi-paragraph 
> explanation like the above it's probably not. This obviously includes 
> correctness, but it's even more important that people can easily understand 
> what you're doing.
> 2. fast enough, where it matters. gpu reset just doesn't.
>
> [Dennis Li] Yeah. I strongly agree that manual lockdep annotation will 
> increase complexity. However my patch isn't for lockdep annotation, and in 
> fact it is to fix a bug. According to design of lockdep, every lock object 
> should has a separated  lock_class_key which is an unified ID of lock object 
> in lockdep.

Nope that 's not how lockdep works. It intentionally shares the
lockdep class. If every

RE: [PATCH] drm/amdgpu: annotate a false positive recursive locking

[AMD Public Use]

On Fri, Aug 7, 2020 at 1:59 PM Li, Dennis  wrote:
>
> [AMD Public Use]
>
> Hi, Daniel,
>   Thanks your review. And I also understand your concern. I guess you 
> missed the description of this issue, so I paste it again in the below, and 
> explain why this issue happens.
>
>   For example, in a XGMI system with 2 GPU devices whose device entity is 
> named adev. And each adev has a separated reset_sem.
>   // device init function
>   void device_init(adev) {
> ...
> init_rwsem(&adev->reset_sem);
>   ...
>   }
>
>  XGMI system have two GPU devices, so system will call device_init twice. 
> However the definition of init_rwsem macro has a limitation which use a local 
> static lock_class_key to initialize rw_sem, which cause each adev->reset_sem 
> share the same lock_class_key.
>
>  #define init_rwsem(sem) \
>  do {   \
>  static struct lock_class_key __key;\
> \
> __init_rwsem((sem), #sem, &__key);  \
>  } while (0)
>
>  And when GPU hang, we should do gpu recovery for all devices in the 
> hive. Therefore we should lock each adev->reset_sem to protect GPU from be 
> accessed by other threads during recovery. The detailed recovery sequence as 
> the following:
>  // Step1: lock all GPU firstly:
>  for each adev of GPU0 or GPU1 {
>down_write(adev->reset_sem);
>do_suspend(adev);
> }
>
> // Step2:
> do_hive_recovery(hive);
>
> // Step3: resume and unlock GPU
> for each adev of GPU0 or GPU1 {
>   do_resume(adev)
>   up_write(adev->reset_sem);
> }
>
> Each adev->reset_sem has the same lock_class_key, and lockdep will take 
> them as the same rw_semaphore object. Therefore in step1, when lock GPU1, 
> lockdep will pop the below warning.
>
> I have considered your proposal (using  _nest_lock() ) before. Just as 
> you said, _nest_lock() will hide true positive recursive locking. So I gave 
> up it in the end.
>
> After reviewing codes of lockdep, I found the lockdep support 
> dynamic_key, so using separated lock_class_key has no any side effect. In 
> fact, init_rwsem also use dynamic_key. Please see the call sequence of 
> init_rwsem and lockdep_set_class as the below:
>1) init_rwsem -> __init_rwsem -> lockdep_init_map;
>2) lockdep_set_class -> lockdep_init_map;
>
> Finally we go back to your concern, you maybe worry this change will 
> cause the below dead-lock can't be detected. In fact, lockdep still is able 
> to detect the issue as circular locking dependency, but there is no warning 
> "recursive locking " in this case.
> Thread A: down_write(adev->reset_sem) for GPU 0 -> 
> down_write(adev->reset_sem) for GPU 1 -> ... -> up_write(adev->reset_sem) for 
> GPU 1 -> up_write(adev->reset_sem) for GPU 0
> Thread B: down_write(adev->reset_sem) for GPU 1 -> 
> down_write(adev->reset_sem) for GPU 0 -> ... -> 
> up_write(adev->reset_sem) for GPU 0 -> up_write(adev->reset_sem) for 
> GPU 1
>
> But lockdep still can detect recursive locking for this case:
> Thread A: down_write(adev->reset_sem) for GPU 0 -> ...-> ...-> 
> down_write(adev->reset_sem) for GPU 0

Yeah, I guessed correctly what you're doing. My recommendation to use 
down_write_nest_lock still stands. This is for reset only, kernel-wide reset 
lock really shouldn't hurt. Or make it a lock per xgmi hive, I'm assuming that 
information is known somewhere.

The problem with manual lockdep annotations is that they increase complexity. 
You have to keep all the annotations in mind, including justifcations and which 
parts they still catch and which parts they don't catch. And there's zero 
performance justification for a gpu reset path to create some fancy lockdep 
special cases.

Locking needs to be
1. maintainable, i.e. every time you need to write a multi-paragraph 
explanation like the above it's probably not. This obviously includes 
correctness, but it's even more important that people can easily understand 
what you're doing.
2. fast enough, where it matters. gpu reset just doesn't.

[Dennis Li] Yeah. I strongly agree that manual lockdep annotation will increase 
complexity. However my patch isn't for lockdep annotation, and in fact it is to 
fix a bug. According to design of lockdep, every lock object should has a 
separated  lock_class_key which is an unified ID of lock object in lockdep.
I still take the previous sample codes for example: we define two rw_sem: a and 
b, and use case1 and case2 to init them.  In my opinion, case1 and case2 should 
have same behavior, but in fact, they are different, because case2 use 
init_rwsem macro wrongly. Therefore we should lockdep_set_class to fix this 
issue, such as case3. 

Case 1:
init_

Re: [PATCH] drm/amd/powerplay: correct Vega20 cached smu feature state

2020-08-07 Thread Deucher, Alexander

[AMD Official Use Only - Internal Distribution Only]

Acked-by: Alex Deucher 

From: Quan, Evan 
Sent: Friday, August 7, 2020 5:30 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Kuehling, Felix 
; Quan, Evan 
Subject: [PATCH] drm/amd/powerplay: correct Vega20 cached smu feature state

Correct the cached smu feature state on pp_features sysfs
setting.

Change-Id: Icc4c3ce764876a0ffdc86ad4c8a8b9c9f0ed0e97
Signed-off-by: Evan Quan 
---
 .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 38 +--
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index 90c78f127f7e..acc926c20c55 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -984,10 +984,7 @@ static int vega20_disable_all_smu_features(struct pp_hwmgr 
*hwmgr)
 {
 struct vega20_hwmgr *data =
 (struct vega20_hwmgr *)(hwmgr->backend);
-   uint64_t features_enabled;
-   int i;
-   bool enabled;
-   int ret = 0;
+   int i, ret = 0;

 PP_ASSERT_WITH_CODE((ret = smum_send_msg_to_smc(hwmgr,
 PPSMC_MSG_DisableAllSmuFeatures,
@@ -995,17 +992,8 @@ static int vega20_disable_all_smu_features(struct pp_hwmgr 
*hwmgr)
 "[DisableAllSMUFeatures] Failed to disable all smu 
features!",
 return ret);

-   ret = vega20_get_enabled_smc_features(hwmgr, &features_enabled);
-   PP_ASSERT_WITH_CODE(!ret,
-   "[DisableAllSMUFeatures] Failed to get enabled smc 
features!",
-   return ret);
-
-   for (i = 0; i < GNLD_FEATURES_MAX; i++) {
-   enabled = (features_enabled & 
data->smu_features[i].smu_feature_bitmap) ?
-   true : false;
-   data->smu_features[i].enabled = enabled;
-   data->smu_features[i].supported = enabled;
-   }
+   for (i = 0; i < GNLD_FEATURES_MAX; i++)
+   data->smu_features[i].enabled = 0;

 return 0;
 }
@@ -3242,10 +3230,11 @@ static int vega20_get_ppfeature_status(struct pp_hwmgr 
*hwmgr, char *buf)

 static int vega20_set_ppfeature_status(struct pp_hwmgr *hwmgr, uint64_t 
new_ppfeature_masks)
 {
-   uint64_t features_enabled;
-   uint64_t features_to_enable;
-   uint64_t features_to_disable;
-   int ret = 0;
+   struct vega20_hwmgr *data =
+   (struct vega20_hwmgr *)(hwmgr->backend);
+   uint64_t features_enabled, features_to_enable, features_to_disable;
+   int i, ret = 0;
+   bool enabled;

 if (new_ppfeature_masks >= (1ULL << GNLD_FEATURES_MAX))
 return -EINVAL;
@@ -3274,6 +3263,17 @@ static int vega20_set_ppfeature_status(struct pp_hwmgr 
*hwmgr, uint64_t new_ppfe
 return ret;
 }

+   /* Update the cached feature enablement state */
+   ret = vega20_get_enabled_smc_features(hwmgr, &features_enabled);
+   if (ret)
+   return ret;
+
+   for (i = 0; i < GNLD_FEATURES_MAX; i++) {
+   enabled = (features_enabled & 
data->smu_features[i].smu_feature_bitmap) ?
+   true : false;
+   data->smu_features[i].enabled = enabled;
+   }
+
 return 0;
 }

--
2.28.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/powerplay: correct UVD/VCE PG state on custom pptable uploading

2020-08-07 Thread Deucher, Alexander

[AMD Official Use Only - Internal Distribution Only]

Acked-by: Alex Deucher 

From: Quan, Evan 
Sent: Friday, August 7, 2020 5:31 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Quan, Evan 

Subject: [PATCH] drm/amd/powerplay: correct UVD/VCE PG state on custom pptable 
uploading

The UVD/VCE PG state is managed by UVD and VCE IP. It's error-prone to
assume the bootup state in SMU based on the dpm status.

Change-Id: Ib88298ab9812d7d242592bcd55eea140bef6696a
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index acc926c20c55..da84012b7fd5 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -1645,12 +1645,6 @@ static void vega20_init_powergate_state(struct pp_hwmgr 
*hwmgr)

 data->uvd_power_gated = true;
 data->vce_power_gated = true;
-
-   if (data->smu_features[GNLD_DPM_UVD].enabled)
-   data->uvd_power_gated = false;
-
-   if (data->smu_features[GNLD_DPM_VCE].enabled)
-   data->vce_power_gated = false;
 }

 static int vega20_enable_dpm_tasks(struct pp_hwmgr *hwmgr)
--
2.28.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: stop touching bo->tbo.ttm

2020-08-07 Thread Felix Kuehling


Am 2020-08-07 um 8:33 a.m. schrieb Christian König:
> This is not allocated any more for SG BOs.
Can you point me at the relevant TTM changes that require this change?

We'd need to test that the SG BO is still working as expected. A
doorbell self-ring test or a GPU HDP flush test in KFDTest should do the
trick.

Regards,
  Felix


>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 9015c7b76d60..2470b913038b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -1232,10 +1232,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
>   domain_string(alloc_domain), ret);
>   goto err_bo_create;
>   }
> - if (bo_type == ttm_bo_type_sg) {
> + if (bo_type == ttm_bo_type_sg)
>   bo->tbo.sg = sg;
> - bo->tbo.ttm->sg = sg;
> - }
>   bo->kfd_bo = *mem;
>   (*mem)->bo = bo;
>   if (user_addr)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: force raven as "dgpu" path

2020-08-07 Thread Felix Kuehling

Am 2020-08-07 um 4:25 a.m. schrieb Huang Rui:
> We still have a few iommu issues which need to address, so force raven
> as "dgpu" path for the moment.
>
> Will enable IOMMUv2 since the issues are fixed.

Do you mean "_when_ the issues are fixed"?

The current iommuv2 troubles aside, I think this change breaks existing
user mode. The existing Thunk is not prepared to see Raven as a dGPU. So
this is not something we want to do in an upstream Linux kernel change.

I suggest using the ignore_crat module parameter for the workaround
instead. You may need to duplicate the raven_device_info and pick the
right one depending on whether it is a dGPU or an APU. The only
difference would be the need_iommu_device option. If ignore_crat is set,
you can support Raven as a dGPU and require a corresponding Thunk change
that conditionally support Raven as a dGPU.

I think such a change would also be the right direction for supporting
Raven more universally in the future. It can be extended to
conditionally treat Raven as a dGPU automatically in some situations:

  * broken or missing CRAT table
  * IOMMUv2 disabled

Those are all situations where the current driver is broken anyway (and
always has been), so it would not be a kernel change that breaks
existing user mode.

In addition the Thunk could be changed to downgrade a Raven APU to dGPU
(by splitting the APU node into a separate CPU and dGPU node) if other
dGPUs are present in the systems to disable all the APU-specific code
paths and allow all the GPUs to work together seamlessly with SVM.

Regards,
  Felix


> Signed-off-by: Huang Rui 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c   | 6 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 6a250f8fcfb8..66d9f7280fe8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -22,6 +22,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include "kfd_crat.h"
>  #include "kfd_priv.h"
>  #include "kfd_topology.h"
> @@ -781,6 +782,11 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
> *size)
>   return -ENODATA;
>   }
>  
> + /* Raven series will go with the fallback path to use virtual CRAT */
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
> + boot_cpu_data.x86 == 0x17)
> + return -ENODATA;
> +
>   pcrat_image = kmemdup(crat_table, crat_table->length, GFP_KERNEL);
>   if (!pcrat_image)
>   return -ENOMEM;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index d5e790f046b4..93179c928371 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -116,6 +116,7 @@ static const struct kfd_device_info carrizo_device_info = 
> {
>   .num_xgmi_sdma_engines = 0,
>   .num_sdma_queues_per_engine = 2,
>  };
> +#endif
>  
>  static const struct kfd_device_info raven_device_info = {
>   .asic_family = CHIP_RAVEN,
> @@ -128,13 +129,12 @@ static const struct kfd_device_info raven_device_info = 
> {
>   .num_of_watch_points = 4,
>   .mqd_size_aligned = MQD_SIZE_ALIGNED,
>   .supports_cwsr = true,
> - .needs_iommu_device = true,
> + .needs_iommu_device = false,
>   .needs_pci_atomics = true,
>   .num_sdma_engines = 1,
>   .num_xgmi_sdma_engines = 0,
>   .num_sdma_queues_per_engine = 2,
>  };
> -#endif
>  
>  static const struct kfd_device_info hawaii_device_info = {
>   .asic_family = CHIP_HAWAII,
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 7/7] drm/amd/display: Replace DRM private objects with subclassed DRM atomic state

2020-08-07 Thread Kazlauskas, Nicholas


On 2020-08-07 4:52 a.m., dan...@ffwll.ch wrote:

On Thu, Jul 30, 2020 at 04:36:42PM -0400, Nicholas Kazlauskas wrote:

@@ -440,7 +431,7 @@ struct dm_crtc_state {
  #define to_dm_crtc_state(x) container_of(x, struct dm_crtc_state, base)
  
  struct dm_atomic_state {

-   struct drm_private_state base;
+   struct drm_atomic_state base;
  
  	struct dc_state *context;


Also curiosity: Can't we just embed dc_state here, instead of a pointer?
Then it would become a lot more obvious that mostly this is a state object
container like drm_atomic_state, but for the DC specific state structures.
And we could look into moving the actual DC states into drm private states
perhaps (if that helps with the code structure and overall flow).

Maybe as next steps.
-Daniel



It's the refcounting that's the problem with this stuff. I'd like to 
move DC to a model where we have no memory allocation/ownership but that 
might be a bit of a more long term plan at this point.


Same with dc_plane_state and dc_stream_state as well - these could exist 
on the DRM objects as long as they're not refcounted.


Regards,
Nicholas Kazlauskas
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/7] drm/amd/display: Avoid using unvalidated tiling_flags and tmz_surface in prepare_planes

2020-08-07 Thread Kazlauskas, Nicholas


On 2020-08-07 4:30 a.m., dan...@ffwll.ch wrote:

On Thu, Jul 30, 2020 at 04:36:38PM -0400, Nicholas Kazlauskas wrote:

[Why]
We're racing with userspace as the flags could potentially change
from when we acquired and validated them in commit_check.


Uh ... I didn't know these could change. I think my comments on Bas'
series are even more relevant now. I think long term would be best to bake
these flags in at addfb time when modifiers aren't set. And otherwise
always use the modifiers flag, and completely ignore the legacy flags
here.
-Daniel



There's a set tiling/mod flags IOCTL that can be called after addfb 
happens, so unless there's some sort of driver magic preventing this 
from working when it's already been pinned for scanout then I don't see 
anything stopping this from happening.


I still need to review the modifiers series in a little more detail but 
that looks like a good approach to fixing these kind of issues.


Regards,
Nicholas Kazlauskas



[How]
We unfortunately can't drop this function in its entirety from
prepare_planes since we don't know the afb->address at commit_check
time yet.

So instead of querying new tiling_flags and tmz_surface use the ones
from the plane_state directly.

While we're at it, also update the force_disable_dcc option based
on the state from atomic check.

Cc: Bhawanpreet Lakha 
Cc: Rodrigo Siqueira 
Signed-off-by: Nicholas Kazlauskas 
---
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 36 ++-
  1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index bf1881bd492c..f78c09c9585e 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5794,14 +5794,8 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
struct list_head list;
struct ttm_validate_buffer tv;
struct ww_acquire_ctx ticket;
-   uint64_t tiling_flags;
uint32_t domain;
int r;
-   bool tmz_surface = false;
-   bool force_disable_dcc = false;
-
-   dm_plane_state_old = to_dm_plane_state(plane->state);
-   dm_plane_state_new = to_dm_plane_state(new_state);
  
  	if (!new_state->fb) {

DRM_DEBUG_DRIVER("No FB bound\n");
@@ -5845,27 +5839,35 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
return r;
}
  
-	amdgpu_bo_get_tiling_flags(rbo, &tiling_flags);

-
-   tmz_surface = amdgpu_bo_encrypted(rbo);
-
ttm_eu_backoff_reservation(&ticket, &list);
  
  	afb->address = amdgpu_bo_gpu_offset(rbo);
  
  	amdgpu_bo_ref(rbo);
  
+	/**

+* We don't do surface updates on planes that have been newly created,
+* but we also don't have the afb->address during atomic check.
+*
+* Fill in buffer attributes depending on the address here, but only on
+* newly created planes since they're not being used by DC yet and this
+* won't modify global state.
+*/
+   dm_plane_state_old = to_dm_plane_state(plane->state);
+   dm_plane_state_new = to_dm_plane_state(new_state);
+
if (dm_plane_state_new->dc_state &&
-   dm_plane_state_old->dc_state != 
dm_plane_state_new->dc_state) {
-   struct dc_plane_state *plane_state = 
dm_plane_state_new->dc_state;
+   dm_plane_state_old->dc_state != dm_plane_state_new->dc_state) {
+   struct dc_plane_state *plane_state =
+   dm_plane_state_new->dc_state;
+   bool force_disable_dcc = !plane_state->dcc.enable;
  
-		force_disable_dcc = adev->asic_type == CHIP_RAVEN && adev->in_suspend;

fill_plane_buffer_attributes(
adev, afb, plane_state->format, plane_state->rotation,
-   tiling_flags, &plane_state->tiling_info,
-   &plane_state->plane_size, &plane_state->dcc,
-   &plane_state->address, tmz_surface,
-   force_disable_dcc);
+   dm_plane_state_new->tiling_flags,
+   &plane_state->tiling_info, &plane_state->plane_size,
+   &plane_state->dcc, &plane_state->address,
+   dm_plane_state_new->tmz_surface, force_disable_dcc);
}
  
  	return 0;

--
2.25.1

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 5/7] drm/amd/display: Reset plane for anything that's not a FAST update

2020-08-07 Thread Kazlauskas, Nicholas


On 2020-08-07 4:34 a.m., dan...@ffwll.ch wrote:

On Thu, Jul 30, 2020 at 04:36:40PM -0400, Nicholas Kazlauskas wrote:

[Why]
MEDIUM or FULL updates can require global validation or affect
bandwidth. By treating these all simply as surface updates we aren't
actually passing this through DC global validation.

[How]
There's currently no way to pass surface updates through DC global
validation, nor do I think it's a good idea to change the interface
to accept these.

DC global validation itself is currently stateless, and we can move
our update type checking to be stateless as well by duplicating DC
surface checks in DM based on DRM properties.

We wanted to rely on DC automatically determining this since DC knows
best, but DM is ultimately what fills in everything into DC plane
state so it does need to know as well.

There are basically only three paths that we exercise in DM today:

1) Cursor (async update)
2) Pageflip (fast update)
3) Full pipe programming (medium/full updates)

Which means that anything that's more than a pageflip really needs to
go down path #3.

So this change duplicates all the surface update checks based on DRM
state instead inside of should_reset_plane().

Next step is dropping dm_determine_update_type_for_commit and we no
longer require the old DC state at all for global validation.


I think we do something similar in i915, where we have a "nothing except
base address changed" fast path, but for anything else we fully compute a
new state. Obviously you should try to keep global state synchronization
to a minimum for this step, so it's not entirely only 2 options.

Once we have the states, we compare them and figure out whether we can get
away with a fast modeset operation (maybe what you guys call medium
update). Anyway I think being slightly more aggressive with computing full
state, and then falling back to more optimized update again is a good
approach. Only risk is if we you have too much synchronization in your
locking (e.g. modern compositors do like to change tiling and stuff,
especially once you have modifiers enabled, so this shouldn't cause a sync
across crtc except when absolutely needed).
-Daniel


Sounds like the right approach then.

We can support tiling changes in the fast path, but the more optimized 
version of that last check is really linear <-> tiled. That requires 
global validation with DC to revalidate bandwidth and calculate 
requestor parameters for HW. So we'll have to stall for some of these 
changes unfortunately since we need the full HW state for validation.


Regards,
Nicholas Kazlauskas





Optimization can come later so we don't reset DC planes at all for
MEDIUM udpates and avoid validation, but we might require some extra
checks in DM to achieve this.

Cc: Bhawanpreet Lakha 
Cc: Hersen Wu 
Signed-off-by: Nicholas Kazlauskas 
---
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 25 +++
  1 file changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 0d5f45742bb5..2cbb29199e61 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -8336,6 +8336,31 @@ static bool should_reset_plane(struct drm_atomic_state 
*state,
if (old_other_state->crtc != new_other_state->crtc)
return true;
  
+		/* Src/dst size and scaling updates. */

+   if (old_other_state->src_w != new_other_state->src_w ||
+   old_other_state->src_h != new_other_state->src_h ||
+   old_other_state->crtc_w != new_other_state->crtc_w ||
+   old_other_state->crtc_h != new_other_state->crtc_h)
+   return true;
+
+   /* Rotation / mirroring updates. */
+   if (old_other_state->rotation != new_other_state->rotation)
+   return true;
+
+   /* Blending updates. */
+   if (old_other_state->pixel_blend_mode !=
+   new_other_state->pixel_blend_mode)
+   return true;
+
+   /* Alpha updates. */
+   if (old_other_state->alpha != new_other_state->alpha)
+   return true;
+
+   /* Colorspace changes. */
+   if (old_other_state->color_range != 
new_other_state->color_range ||
+   old_other_state->color_encoding != 
new_other_state->color_encoding)
+   return true;
+
/* Framebuffer checks fall at the end. */
if (!old_other_state->fb || !new_other_state->fb)
continue;
--
2.25.1

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesk

RFC: How to adjust the trace pid?

Hi everybody,

in amdgpu we got the following issue which I'm seeking advise how to cleanly 
handle it.

We have a bunch of trace points which are related to the VM subsystem and 
executed in either a work item, kthread or foreign process context.

Now tracing the pid of the context which we are executing in is not really that 
useful, so I'm wondering if we could just overwrite the pid recorded in the 
trace entry?

The following patch does exactly that for the vm_grab_id() trace point, but I'm 
not 100% sure if that is legal or not.

Any ideas? Comments?

Thanks,
Christian.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: adjust the pid in the grab_id trace point

Trace something useful instead of the pid of a kernel thread here.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
index 5da20fc166d9..07f99ef69d91 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
@@ -228,6 +228,7 @@ TRACE_EVENT(amdgpu_vm_grab_id,
 ),
 
TP_fast_assign(
+  __entry->ent.pid = vm->task_info.pid;
   __entry->pasid = vm->pasid;
   __assign_str(ring, ring->name)
   __entry->vmid = job->vmid;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: stop touching bo->tbo.ttm

This is not allocated any more for SG BOs.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 9015c7b76d60..2470b913038b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1232,10 +1232,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
domain_string(alloc_domain), ret);
goto err_bo_create;
}
-   if (bo_type == ttm_bo_type_sg) {
+   if (bo_type == ttm_bo_type_sg)
bo->tbo.sg = sg;
-   bo->tbo.ttm->sg = sg;
-   }
bo->kfd_bo = *mem;
(*mem)->bo = bo;
if (user_addr)
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking

On Fri, Aug 7, 2020 at 1:59 PM Li, Dennis  wrote:
>
> [AMD Public Use]
>
> Hi, Daniel,
>   Thanks your review. And I also understand your concern. I guess you 
> missed the description of this issue, so I paste it again in the below, and 
> explain why this issue happens.
>
>   For example, in a XGMI system with 2 GPU devices whose device entity is 
> named adev. And each adev has a separated reset_sem.
>   // device init function
>   void device_init(adev) {
> ...
> init_rwsem(&adev->reset_sem);
>   ...
>   }
>
>  XGMI system have two GPU devices, so system will call device_init twice. 
> However the definition of init_rwsem macro has a limitation which use a local 
> static lock_class_key to initialize rw_sem, which cause each adev->reset_sem 
> share the same lock_class_key.
>
>  #define init_rwsem(sem) \
>  do {   \
>  static struct lock_class_key __key;\
> \
> __init_rwsem((sem), #sem, &__key);  \
>  } while (0)
>
>  And when GPU hang, we should do gpu recovery for all devices in the 
> hive. Therefore we should lock each adev->reset_sem to protect GPU from be 
> accessed by other threads during recovery. The detailed recovery sequence as 
> the following:
>  // Step1: lock all GPU firstly:
>  for each adev of GPU0 or GPU1 {
>down_write(adev->reset_sem);
>do_suspend(adev);
> }
>
> // Step2:
> do_hive_recovery(hive);
>
> // Step3: resume and unlock GPU
> for each adev of GPU0 or GPU1 {
>   do_resume(adev)
>   up_write(adev->reset_sem);
> }
>
> Each adev->reset_sem has the same lock_class_key, and lockdep will take 
> them as the same rw_semaphore object. Therefore in step1, when lock GPU1, 
> lockdep will pop the below warning.
>
> I have considered your proposal (using  _nest_lock() ) before. Just as 
> you said, _nest_lock() will hide true positive recursive locking. So I gave 
> up it in the end.
>
> After reviewing codes of lockdep, I found the lockdep support 
> dynamic_key, so using separated lock_class_key has no any side effect. In 
> fact, init_rwsem also use dynamic_key. Please see the call sequence of 
> init_rwsem and lockdep_set_class as the below:
>1) init_rwsem -> __init_rwsem -> lockdep_init_map;
>2) lockdep_set_class -> lockdep_init_map;
>
> Finally we go back to your concern, you maybe worry this change will 
> cause the below dead-lock can't be detected. In fact, lockdep still is able 
> to detect the issue as circular locking dependency, but there is no warning 
> "recursive locking " in this case.
> Thread A: down_write(adev->reset_sem) for GPU 0 -> 
> down_write(adev->reset_sem) for GPU 1 -> ... -> up_write(adev->reset_sem) for 
> GPU 1 -> up_write(adev->reset_sem) for GPU 0
> Thread B: down_write(adev->reset_sem) for GPU 1 -> 
> down_write(adev->reset_sem) for GPU 0 -> ... -> up_write(adev->reset_sem) for 
> GPU 0 -> up_write(adev->reset_sem) for GPU 1
>
> But lockdep still can detect recursive locking for this case:
> Thread A: down_write(adev->reset_sem) for GPU 0 -> ...-> ...-> 
> down_write(adev->reset_sem) for GPU 0

Yeah, I guessed correctly what you're doing. My recommendation to use
down_write_nest_lock still stands. This is for reset only, kernel-wide
reset lock really shouldn't hurt. Or make it a lock per xgmi hive, I'm
assuming that information is known somewhere.

The problem with manual lockdep annotations is that they increase
complexity. You have to keep all the annotations in mind, including
justifcations and which parts they still catch and which parts they
don't catch. And there's zero performance justification for a gpu
reset path to create some fancy lockdep special cases.

Locking needs to be
1. maintainable, i.e. every time you need to write a multi-paragraph
explanation like the above it's probably not. This obviously includes
correctness, but it's even more important that people can easily
understand what you're doing.
2. fast enough, where it matters. gpu reset just doesn't.

> [  584.110304] 
> [  584.110590] WARNING: possible recursive locking detected
> [  584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G   OE
> [  584.64] 
> [  584.111456] kworker/38:1/553 is trying to acquire lock:
> [  584.111721] 9b15ff0a47a0 (&adev->reset_sem){}, at: 
> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
> [  584.112112]
>but task is already holding lock:
> [  584.112673] 9b1603d247a0 (&adev->reset_sem){}, at: 
> amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
> [  584.113068]
>other info that might help us debu

RE: [PATCH] drm/amdgpu: annotate a false positive recursive locking

[AMD Public Use]

Hi, Daniel,
  Thanks your review. And I also understand your concern. I guess you 
missed the description of this issue, so I paste it again in the below, and 
explain why this issue happens.

  For example, in a XGMI system with 2 GPU devices whose device entity is 
named adev. And each adev has a separated reset_sem. 
  // device init function 
  void device_init(adev) {
...
init_rwsem(&adev->reset_sem);
  ...
  }

 XGMI system have two GPU devices, so system will call device_init twice. 
However the definition of init_rwsem macro has a limitation which use a local 
static lock_class_key to initialize rw_sem, which cause each adev->reset_sem 
share the same lock_class_key. 

 #define init_rwsem(sem) \ 
 do {   \
 static struct lock_class_key __key;\
\
__init_rwsem((sem), #sem, &__key);  \
 } while (0)

 And when GPU hang, we should do gpu recovery for all devices in the hive. 
Therefore we should lock each adev->reset_sem to protect GPU from be accessed 
by other threads during recovery. The detailed recovery sequence as the 
following:
 // Step1: lock all GPU firstly:
 for each adev of GPU0 or GPU1 {
   down_write(adev->reset_sem); 
   do_suspend(adev); 
}

// Step2:
do_hive_recovery(hive);

// Step3: resume and unlock GPU
for each adev of GPU0 or GPU1 {
  do_resume(adev)
  up_write(adev->reset_sem);
}

Each adev->reset_sem has the same lock_class_key, and lockdep will take 
them as the same rw_semaphore object. Therefore in step1, when lock GPU1, 
lockdep will pop the below warning. 

I have considered your proposal (using  _nest_lock() ) before. Just as you 
said, _nest_lock() will hide true positive recursive locking. So I gave up it 
in the end. 

After reviewing codes of lockdep, I found the lockdep support dynamic_key, 
so using separated lock_class_key has no any side effect. In fact, init_rwsem 
also use dynamic_key. Please see the call sequence of init_rwsem and 
lockdep_set_class as the below:
   1) init_rwsem -> __init_rwsem -> lockdep_init_map; 
   2) lockdep_set_class -> lockdep_init_map;

Finally we go back to your concern, you maybe worry this change will cause 
the below dead-lock can't be detected. In fact, lockdep still is able to detect 
the issue as circular locking dependency, but there is no warning "recursive 
locking " in this case. 
Thread A: down_write(adev->reset_sem) for GPU 0 -> 
down_write(adev->reset_sem) for GPU 1 -> ... -> up_write(adev->reset_sem) for 
GPU 1 -> up_write(adev->reset_sem) for GPU 0
Thread B: down_write(adev->reset_sem) for GPU 1 -> 
down_write(adev->reset_sem) for GPU 0 -> ... -> up_write(adev->reset_sem) for 
GPU 0 -> up_write(adev->reset_sem) for GPU 1

But lockdep still can detect recursive locking for this case:
Thread A: down_write(adev->reset_sem) for GPU 0 -> ...-> ...-> 
down_write(adev->reset_sem) for GPU 0 

[  584.110304] 
[  584.110590] WARNING: possible recursive locking detected
[  584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G   OE
[  584.64] 
[  584.111456] kworker/38:1/553 is trying to acquire lock:
[  584.111721] 9b15ff0a47a0 (&adev->reset_sem){}, at: 
amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.112112]
   but task is already holding lock:
[  584.112673] 9b1603d247a0 (&adev->reset_sem){}, at: 
amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.113068]
   other info that might help us debug this:
[  584.113689]  Possible unsafe locking scenario:

[  584.114350]CPU0
[  584.114685]
[  584.115014]   lock(&adev->reset_sem);
[  584.115349]   lock(&adev->reset_sem);
[  584.115678]
*** DEADLOCK ***

[  584.116624]  May be due to missing lock nesting notation

[  584.117284] 4 locks held by kworker/38:1/553:
[  584.117616]  #0: 9ad635c1d348 ((wq_completion)events){+.+.}, at: 
process_one_work+0x21f/0x630
[  584.117967]  #1: ac708e1c3e58 
((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630
[  584.118358]  #2: c1c2a5d0 (&tmp->hive_lock){+.+.}, at: 
amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu]
[  584.118786]  #3: 9b1603d247a0 (&adev->reset_sem){}, at: 
amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]

Best Regards
Dennis Li
-Original Message-
From: Daniel Vetter  
Sent: Friday, August 7, 2020 5:45 PM
To: Koenig, Christian 
Cc: Li, Dennis ; amd-gfx@lists.freedesktop.org; Deucher, 
Alexander ; Kuehling, Felix 
; Zhang, Hawking 
Subject: Re: [PATCH] drm/amdgpu: annotate a false positive r

Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking

On Fri, Aug 7, 2020 at 11:34 AM Christian König
 wrote:
>
> Am 07.08.20 um 11:23 schrieb Daniel Vetter:
> > On Fri, Aug 7, 2020 at 11:20 AM Daniel Vetter  wrote:
> >> On Fri, Aug 7, 2020 at 11:08 AM Christian König
> >>  wrote:
> >>> [SNIP]
> >>> What we should do instead is to make sure we have only a single lock 
> >>> for the complete hive instead.
> >>> [Dennis Li] If we use a single lock, users will must wait for all 
> >>> devices resuming successfully, but in fact their tasks are only 
> >>> running in device a. It is not friendly to users.
> >> Well that is intentional I would say. We can only restart submissions 
> >> after all devices are resumed successfully, cause otherwise it can 
> >> happen that a job on device A depends on memory on device B which 
> >> isn't accessible.
> >> [Dennis Li] Yes, you are right. Driver have make sure that the shared 
> >> resources(such as the shard memory) are ready before unlock the lock 
> >> of adev one by one. But each device still has private hardware 
> >> resources such as video  and display.
> > Yeah, but those are negligible and we have a team working on display 
> > support for XGMI. So this will sooner or later become a problem as well.
> >
> > I still think that a single rwsem for the whole hive is still the best 
> > option here.
> >
> > [Dennis Li] For your above concern, we can open a new thread to discuss 
> > it. If we make a decision to use a single after discussing, we can 
> > create another patch for it.
>  That's a really good argument, but I still hesitate to merge this patch.
>  How severe is the lockdep splat?
> 
>  At bare minimum we need a "/* TODO: " comment why we do this and how 
>  to remove the workaround again.
>  [Dennis Li] It is not a workaround. According to design of lockdep, each 
>  rw_semaphore should has a separated lock_class_key. I have explained 
>  that init_rwsem has the above limitation, so we must correct it.
> >>> Yeah, but this explanation is not really sufficient. Again this is not
> >>> an limitation of init_rwsem, but intentional behavior.
> >>>
> >>> In other words in most cases lockdep should splat if you use rw
> >>> semaphores like this.
> >>>
> >>> That we avoid this by locking multiple amdgpu device always in the same
> >>> order is rather questionable driver design.
> >> Yeah don't give multiple locking classes like that, that's fairly
> >> terrible.
>
> Ok, that is exactly the answer I feared we would get.
>
> >> The right fix is mutex_lock_nest_lock, which we're using in
> >> ww_mutex or in which is also used in mm_take_all_locks.
>
> Ah! Enlightenment! So in this case we should use down_write_nested() in
> this special space and all is well?

Nope. There's two kinds of nesting that lockdep supports:
1. _nested() variants, where you supply an integer parameter
specifying the lockdep subclass. That's like setting a locking
subclass at lock creating time, except much worse because you change
the lockdep class at runtime. This can lead to stuff like:

struct mutex A;
mutex_init(&A);
mutex_lock(&A);
mutex_lock_nested(&A, SINGLE_DEPTH_NESTING);

which is a very clear deadlock, but lockdep will think it's all fine.
Don't use that.

2. _nest_lock() variants, where you specify a super-lock, which makes
sure that all sub-locks cannot be acquired concurrently (or you
promise there's some other trick to avoid deadlocks). Lockdep checks
that you've acquired the super-lock, and then allows arbitary nesting.
This is what ww_mutex uses, and what mm_take_all_locks uses. If the
super-lock is an actual mutex, then this is actually deadlock free.
With ww_mutex it's a fake lockdep_map in the ww_acquire_ctx, but we
guarantee deadlock-freeness through the ww algorithm. The fake super
lock for ww_mutex is acquired when you get the ticket.

No 2 is the thing you want, not no 1.

> >>
> >> If you solve the locking ordering by sorting all the locks you need
> >> (ugh), then you can also just use a fake lockdep_map for your special
> >> function only.
> >>
> >> Also in general the idea of an rwsem in conjunction with xgmi cross
> >> device memory handling just plain freaks me out :-) Please make sure
> >> you prime this lock against dma_resv and dma_fence (and maybe also
> >> drm_modeset_lock if this lock is outside of dma_resv), and ideally
> >> this should only ever be used for setup stuff when loading drivers. I
> >> guess that's why you went with a per-device rwsem. If it's really only
> >> for setup then just do the simple thing of having an
> >> xgmi_hive_reconfigure lock, which all the rwsems nest within, and no
> >> fancy lock ordering or other tricks.
>
> Well this is for the group reset of all devices in a hive and you need
> to reboot to change the order the device are in the list.

Ok, that sounds save enough. I'd go with a xgmi_hive_lock then, and
down_write_nest_lock(&adev->xgmi_rwsem, &xgm

Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking

Am 07.08.20 um 11:23 schrieb Daniel Vetter:

On Fri, Aug 7, 2020 at 11:20 AM Daniel Vetter wrote:

On Fri, Aug 7, 2020 at 11:08 AM Christian König
wrote:

[SNIP]

What we should do instead is to make sure we have only a single lock for the
complete hive instead.
[Dennis Li] If we use a single lock, users will must wait for all devices
resuming successfully, but in fact their tasks are only running in device a. It
is not friendly to users.

Well that is intentional I would say. We can only restart submissions after all
devices are resumed successfully, cause otherwise it can happen that a job on
device A depends on memory on device B which isn't accessible.
[Dennis Li] Yes, you are right. Driver have make sure that the shared
resources(such as the shard memory) are ready before unlock the lock of adev
one by one. But each device still has private hardware resources such as video
and display.

Yeah, but those are negligible and we have a team working on display support
for XGMI. So this will sooner or later become a problem as well.

I still think that a single rwsem for the whole hive is still the best option
here.

[Dennis Li] For your above concern, we can open a new thread to discuss it. If
we make a decision to use a single after discussing, we can create another
patch for it.

That's a really good argument, but I still hesitate to merge this patch.
How severe is the lockdep splat?

At bare minimum we need a "/* TODO: " comment why we do this and how to
remove the workaround again.
[Dennis Li] It is not a workaround. According to design of lockdep, each
rw_semaphore should has a separated lock_class_key. I have explained that
init_rwsem has the above limitation, so we must correct it.

Yeah, but this explanation is not really sufficient. Again this is not
an limitation of init_rwsem, but intentional behavior.

In other words in most cases lockdep should splat if you use rw
semaphores like this.

That we avoid this by locking multiple amdgpu device always in the same
order is rather questionable driver design.

Yeah don't give multiple locking classes like that, that's fairly
terrible.

Ok, that is exactly the answer I feared we would get.

The right fix is mutex_lock_nest_lock, which we're using in
ww_mutex or in which is also used in mm_take_all_locks.

Ah! Enlightenment! So in this case we should use down_write_nested() in
this special space and all is well?

If you solve the locking ordering by sorting all the locks you need
(ugh), then you can also just use a fake lockdep_map for your special
function only.

Also in general the idea of an rwsem in conjunction with xgmi cross
device memory handling just plain freaks me out :-) Please make sure
you prime this lock against dma_resv and dma_fence (and maybe also
drm_modeset_lock if this lock is outside of dma_resv), and ideally
this should only ever be used for setup stuff when loading drivers. I
guess that's why you went with a per-device rwsem. If it's really only
for setup then just do the simple thing of having an
xgmi_hive_reconfigure lock, which all the rwsems nest within, and no
fancy lock ordering or other tricks.

Well this is for the group reset of all devices in a hive and you need
to reboot to change the order the device are in the list.

Regards,
Christian.

Core network driver (net/core/dev.c) has the similar use case with us.

Just took a look at that, but this is completely different use case. The
lockdep classes are grouped by driver type to note the difference in the
network stack.

A quick search showed that no other device driver is doing this in the
kernel, so I'm not sure if such a behavior wouldn't be rejected. Adding
Daniel to comment on this as well.

Felix had some really good arguments to make that an urgent fix, so
either we come up with some real rework of this or at least add a big
"/* TODO*/".

Aside from "I have questions" I don't think there's any reason to no
fix this correctly with mutex_lock_nest_lock. Really shouldn't be a
semantic change from what I'm understand here.

Also one more with my drm maintainer hat on, as a heads-up: Dave&me
looked at i915 gem code a bit too much, and we're appalled at the
massive over-use of lockdep class hacks and dubious nesting
annotations. Expect crack down on anyone who tries to play clever
tricks here, we have a few too many already :-)

So no mutex_lock_nested (totally different beast from
mutex_lock_nest_lock, and really unsafe since it's a runtime change of
the lockdep class) and also not any lockdep_set_class trickery or
anything like that. We need locking hierarchies which mere humans can
comprehend and review.

Cheers, Daniel

Regards,
Christian.

--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=02%7C01%7Cchristian.koenig%40amd.com%7Cfa969d301e9f4a98c05608d83ab3a0f8%7C3dd8961fe4884e608e11a82d994e183d%7

[PATCH] drm/amd/powerplay: correct UVD/VCE PG state on custom pptable uploading

2020-08-07 Thread Evan Quan

The UVD/VCE PG state is managed by UVD and VCE IP. It's error-prone to
assume the bootup state in SMU based on the dpm status.

Change-Id: Ib88298ab9812d7d242592bcd55eea140bef6696a
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index acc926c20c55..da84012b7fd5 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -1645,12 +1645,6 @@ static void vega20_init_powergate_state(struct pp_hwmgr 
*hwmgr)
 
data->uvd_power_gated = true;
data->vce_power_gated = true;
-
-   if (data->smu_features[GNLD_DPM_UVD].enabled)
-   data->uvd_power_gated = false;
-
-   if (data->smu_features[GNLD_DPM_VCE].enabled)
-   data->vce_power_gated = false;
 }
 
 static int vega20_enable_dpm_tasks(struct pp_hwmgr *hwmgr)
-- 
2.28.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amd/powerplay: correct Vega20 cached smu feature state

2020-08-07 Thread Evan Quan

Correct the cached smu feature state on pp_features sysfs
setting.

Change-Id: Icc4c3ce764876a0ffdc86ad4c8a8b9c9f0ed0e97
Signed-off-by: Evan Quan 
---
 .../drm/amd/powerplay/hwmgr/vega20_hwmgr.c| 38 +--
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c 
b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
index 90c78f127f7e..acc926c20c55 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
@@ -984,10 +984,7 @@ static int vega20_disable_all_smu_features(struct pp_hwmgr 
*hwmgr)
 {
struct vega20_hwmgr *data =
(struct vega20_hwmgr *)(hwmgr->backend);
-   uint64_t features_enabled;
-   int i;
-   bool enabled;
-   int ret = 0;
+   int i, ret = 0;
 
PP_ASSERT_WITH_CODE((ret = smum_send_msg_to_smc(hwmgr,
PPSMC_MSG_DisableAllSmuFeatures,
@@ -995,17 +992,8 @@ static int vega20_disable_all_smu_features(struct pp_hwmgr 
*hwmgr)
"[DisableAllSMUFeatures] Failed to disable all smu 
features!",
return ret);
 
-   ret = vega20_get_enabled_smc_features(hwmgr, &features_enabled);
-   PP_ASSERT_WITH_CODE(!ret,
-   "[DisableAllSMUFeatures] Failed to get enabled smc 
features!",
-   return ret);
-
-   for (i = 0; i < GNLD_FEATURES_MAX; i++) {
-   enabled = (features_enabled & 
data->smu_features[i].smu_feature_bitmap) ?
-   true : false;
-   data->smu_features[i].enabled = enabled;
-   data->smu_features[i].supported = enabled;
-   }
+   for (i = 0; i < GNLD_FEATURES_MAX; i++)
+   data->smu_features[i].enabled = 0;
 
return 0;
 }
@@ -3242,10 +3230,11 @@ static int vega20_get_ppfeature_status(struct pp_hwmgr 
*hwmgr, char *buf)
 
 static int vega20_set_ppfeature_status(struct pp_hwmgr *hwmgr, uint64_t 
new_ppfeature_masks)
 {
-   uint64_t features_enabled;
-   uint64_t features_to_enable;
-   uint64_t features_to_disable;
-   int ret = 0;
+   struct vega20_hwmgr *data =
+   (struct vega20_hwmgr *)(hwmgr->backend);
+   uint64_t features_enabled, features_to_enable, features_to_disable;
+   int i, ret = 0;
+   bool enabled;
 
if (new_ppfeature_masks >= (1ULL << GNLD_FEATURES_MAX))
return -EINVAL;
@@ -3274,6 +3263,17 @@ static int vega20_set_ppfeature_status(struct pp_hwmgr 
*hwmgr, uint64_t new_ppfe
return ret;
}
 
+   /* Update the cached feature enablement state */
+   ret = vega20_get_enabled_smc_features(hwmgr, &features_enabled);
+   if (ret)
+   return ret;
+
+   for (i = 0; i < GNLD_FEATURES_MAX; i++) {
+   enabled = (features_enabled & 
data->smu_features[i].smu_feature_bitmap) ?
+   true : false;
+   data->smu_features[i].enabled = enabled;
+   }
+
return 0;
 }
 
-- 
2.28.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking

On Fri, Aug 7, 2020 at 11:20 AM Daniel Vetter  wrote:
>
> On Fri, Aug 7, 2020 at 11:08 AM Christian König
>  wrote:
> >
> > [SNIP]
> >  What we should do instead is to make sure we have only a single lock 
> >  for the complete hive instead.
> >  [Dennis Li] If we use a single lock, users will must wait for all 
> >  devices resuming successfully, but in fact their tasks are only 
> >  running in device a. It is not friendly to users.
> > >>> Well that is intentional I would say. We can only restart submissions 
> > >>> after all devices are resumed successfully, cause otherwise it can 
> > >>> happen that a job on device A depends on memory on device B which isn't 
> > >>> accessible.
> > >>> [Dennis Li] Yes, you are right. Driver have make sure that the shared 
> > >>> resources(such as the shard memory) are ready before unlock the lock of 
> > >>> adev one by one. But each device still has private hardware resources 
> > >>> such as video  and display.
> > >> Yeah, but those are negligible and we have a team working on display 
> > >> support for XGMI. So this will sooner or later become a problem as well.
> > >>
> > >> I still think that a single rwsem for the whole hive is still the best 
> > >> option here.
> > >>
> > >> [Dennis Li] For your above concern, we can open a new thread to discuss 
> > >> it. If we make a decision to use a single after discussing, we can 
> > >> create another patch for it.
> > > That's a really good argument, but I still hesitate to merge this patch.
> > > How severe is the lockdep splat?
> > >
> > > At bare minimum we need a "/* TODO: " comment why we do this and how 
> > > to remove the workaround again.
> > > [Dennis Li] It is not a workaround. According to design of lockdep, each 
> > > rw_semaphore should has a separated lock_class_key. I have explained that 
> > > init_rwsem has the above limitation, so we must correct it.
> >
> > Yeah, but this explanation is not really sufficient. Again this is not
> > an limitation of init_rwsem, but intentional behavior.
> >
> > In other words in most cases lockdep should splat if you use rw
> > semaphores like this.
> >
> > That we avoid this by locking multiple amdgpu device always in the same
> > order is rather questionable driver design.
>
> Yeah don't give multiple locking classes like that, that's fairly
> terrible. The right fix is mutex_lock_nest_lock, which we're using in
> ww_mutex or in which is also used in mm_take_all_locks.
>
> If you solve the locking ordering by sorting all the locks you need
> (ugh), then you can also just use a fake lockdep_map for your special
> function only.
>
> Also in general the idea of an rwsem in conjunction with xgmi cross
> device memory handling just plain freaks me out :-) Please make sure
> you prime this lock against dma_resv and dma_fence (and maybe also
> drm_modeset_lock if this lock is outside of dma_resv), and ideally
> this should only ever be used for setup stuff when loading drivers. I
> guess that's why you went with a per-device rwsem. If it's really only
> for setup then just do the simple thing of having an
> xgmi_hive_reconfigure lock, which all the rwsems nest within, and no
> fancy lock ordering or other tricks.
>
> > >   Core network driver (net/core/dev.c) has the similar use case with us.
> >
> > Just took a look at that, but this is completely different use case. The
> > lockdep classes are grouped by driver type to note the difference in the
> > network stack.
> >
> > A quick search showed that no other device driver is doing this in the
> > kernel, so I'm not sure if such a behavior wouldn't be rejected. Adding
> > Daniel to comment on this as well.
> >
> > Felix had some really good arguments to make that an urgent fix, so
> > either we come up with some real rework of this or at least add a big
> > "/* TODO*/".
>
> Aside from "I have questions" I don't think there's any reason to no
> fix this correctly with mutex_lock_nest_lock. Really shouldn't be a
> semantic change from what I'm understand here.

Also one more with my drm maintainer hat on, as a heads-up: Dave&me
looked at i915 gem code a bit too much, and we're appalled at the
massive over-use of lockdep class hacks and dubious nesting
annotations. Expect crack down on anyone who tries to play clever
tricks here, we have a few too many already :-)

So no mutex_lock_nested (totally different beast from
mutex_lock_nest_lock, and really unsafe since it's a runtime change of
the lockdep class) and also not any lockdep_set_class trickery or
anything like that. We need locking hierarchies which mere humans can
comprehend and review.

Cheers, Daniel

>
> Cheers, Daniel
>
> >
> > Regards,
> > Christian.
> >
> > >
> > > Regards,
> > > Christian.
> > >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing li

Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking

On Fri, Aug 7, 2020 at 11:08 AM Christian König
 wrote:
>
> [SNIP]
>  What we should do instead is to make sure we have only a single lock for 
>  the complete hive instead.
>  [Dennis Li] If we use a single lock, users will must wait for all 
>  devices resuming successfully, but in fact their tasks are only running 
>  in device a. It is not friendly to users.
> >>> Well that is intentional I would say. We can only restart submissions 
> >>> after all devices are resumed successfully, cause otherwise it can happen 
> >>> that a job on device A depends on memory on device B which isn't 
> >>> accessible.
> >>> [Dennis Li] Yes, you are right. Driver have make sure that the shared 
> >>> resources(such as the shard memory) are ready before unlock the lock of 
> >>> adev one by one. But each device still has private hardware resources 
> >>> such as video  and display.
> >> Yeah, but those are negligible and we have a team working on display 
> >> support for XGMI. So this will sooner or later become a problem as well.
> >>
> >> I still think that a single rwsem for the whole hive is still the best 
> >> option here.
> >>
> >> [Dennis Li] For your above concern, we can open a new thread to discuss 
> >> it. If we make a decision to use a single after discussing, we can create 
> >> another patch for it.
> > That's a really good argument, but I still hesitate to merge this patch.
> > How severe is the lockdep splat?
> >
> > At bare minimum we need a "/* TODO: " comment why we do this and how to 
> > remove the workaround again.
> > [Dennis Li] It is not a workaround. According to design of lockdep, each 
> > rw_semaphore should has a separated lock_class_key. I have explained that 
> > init_rwsem has the above limitation, so we must correct it.
>
> Yeah, but this explanation is not really sufficient. Again this is not
> an limitation of init_rwsem, but intentional behavior.
>
> In other words in most cases lockdep should splat if you use rw
> semaphores like this.
>
> That we avoid this by locking multiple amdgpu device always in the same
> order is rather questionable driver design.

Yeah don't give multiple locking classes like that, that's fairly
terrible. The right fix is mutex_lock_nest_lock, which we're using in
ww_mutex or in which is also used in mm_take_all_locks.

If you solve the locking ordering by sorting all the locks you need
(ugh), then you can also just use a fake lockdep_map for your special
function only.

Also in general the idea of an rwsem in conjunction with xgmi cross
device memory handling just plain freaks me out :-) Please make sure
you prime this lock against dma_resv and dma_fence (and maybe also
drm_modeset_lock if this lock is outside of dma_resv), and ideally
this should only ever be used for setup stuff when loading drivers. I
guess that's why you went with a per-device rwsem. If it's really only
for setup then just do the simple thing of having an
xgmi_hive_reconfigure lock, which all the rwsems nest within, and no
fancy lock ordering or other tricks.

> >   Core network driver (net/core/dev.c) has the similar use case with us.
>
> Just took a look at that, but this is completely different use case. The
> lockdep classes are grouped by driver type to note the difference in the
> network stack.
>
> A quick search showed that no other device driver is doing this in the
> kernel, so I'm not sure if such a behavior wouldn't be rejected. Adding
> Daniel to comment on this as well.
>
> Felix had some really good arguments to make that an urgent fix, so
> either we come up with some real rework of this or at least add a big
> "/* TODO*/".

Aside from "I have questions" I don't think there's any reason to no
fix this correctly with mutex_lock_nest_lock. Really shouldn't be a
semantic change from what I'm understand here.

Cheers, Daniel

>
> Regards,
> Christian.
>
> >
> > Regards,
> > Christian.
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: make sure userptr ttm is allocated

2020-08-07 Thread Michel Dänzer

On 2020-08-06 2:56 p.m., Christian König wrote:
> We need to allocate that manually now.
> 
> Signed-off-by: Christian König 
> Fixes: 2ddef17678bc (HEAD) drm/ttm: make TT creation purely optional v3
> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c|  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 17 +++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  4 ++--
>  4 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 9015c7b76d60..55d2e870fdda 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -564,7 +564,7 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t 
> user_addr)
>  
>   mutex_lock(&process_info->lock);
>  
> - ret = amdgpu_ttm_tt_set_userptr(bo->tbo.ttm, user_addr, 0);
> + ret = amdgpu_ttm_tt_set_userptr(&bo->tbo, user_addr, 0);
>   if (ret) {
>   pr_err("%s: Failed to set userptr: %d\n", __func__, ret);
>   goto out;

I suspect more needs to be done for KFD, e.g.
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu can dereference
bo->tbo.ttm before calling init_user_pages, and there are more usages of
bo->tbo.ttm that need to be audited.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: annotate a false positive recursive locking


[SNIP]

What we should do instead is to make sure we have only a single lock for the 
complete hive instead.
[Dennis Li] If we use a single lock, users will must wait for all devices 
resuming successfully, but in fact their tasks are only running in device a. It 
is not friendly to users.

Well that is intentional I would say. We can only restart submissions after all 
devices are resumed successfully, cause otherwise it can happen that a job on 
device A depends on memory on device B which isn't accessible.
[Dennis Li] Yes, you are right. Driver have make sure that the shared 
resources(such as the shard memory) are ready before unlock the lock of adev 
one by one. But each device still has private hardware resources such as video  
and display.

Yeah, but those are negligible and we have a team working on display support 
for XGMI. So this will sooner or later become a problem as well.

I still think that a single rwsem for the whole hive is still the best option 
here.

[Dennis Li] For your above concern, we can open a new thread to discuss it. If 
we make a decision to use a single after discussing, we can create another 
patch for it.

That's a really good argument, but I still hesitate to merge this patch.
How severe is the lockdep splat?

At bare minimum we need a "/* TODO: " comment why we do this and how to 
remove the workaround again.
[Dennis Li] It is not a workaround. According to design of lockdep, each 
rw_semaphore should has a separated lock_class_key. I have explained that 
init_rwsem has the above limitation, so we must correct it.


Yeah, but this explanation is not really sufficient. Again this is not 
an limitation of init_rwsem, but intentional behavior.


In other words in most cases lockdep should splat if you use rw 
semaphores like this.


That we avoid this by locking multiple amdgpu device always in the same 
order is rather questionable driver design.



  Core network driver (net/core/dev.c) has the similar use case with us.


Just took a look at that, but this is completely different use case. The 
lockdep classes are grouped by driver type to note the difference in the 
network stack.


A quick search showed that no other device driver is doing this in the 
kernel, so I'm not sure if such a behavior wouldn't be rejected. Adding 
Daniel to comment on this as well.


Felix had some really good arguments to make that an urgent fix, so 
either we come up with some real rework of this or at least add a big 
"/* TODO*/".


Regards,
Christian.



Regards,
Christian.



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 7/7] drm/amd/display: Replace DRM private objects with subclassed DRM atomic state

On Thu, Jul 30, 2020 at 04:36:42PM -0400, Nicholas Kazlauskas wrote:
> @@ -440,7 +431,7 @@ struct dm_crtc_state {
>  #define to_dm_crtc_state(x) container_of(x, struct dm_crtc_state, base)
>  
>  struct dm_atomic_state {
> - struct drm_private_state base;
> + struct drm_atomic_state base;
>  
>   struct dc_state *context;

Also curiosity: Can't we just embed dc_state here, instead of a pointer?
Then it would become a lot more obvious that mostly this is a state object
container like drm_atomic_state, but for the DC specific state structures.
And we could look into moving the actual DC states into drm private states
perhaps (if that helps with the code structure and overall flow).

Maybe as next steps.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: Skip some registers config for SRIOV

2020-08-07 Thread Liu ChengZhe

Some registers are not accessible to virtual function setup, so
skip their initialization when in VF-SRIOV mode.

v2: move SRIOV VF check into specify functions;
modify commit description and comment.

Signed-off-by: Liu ChengZhe 
---
 drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c | 19 +++
 drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c  | 19 +++
 2 files changed, 38 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
index 1f6112b7fa49..80c906a0383f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_1.c
@@ -182,6 +182,12 @@ static void gfxhub_v2_1_init_cache_regs(struct 
amdgpu_device *adev)
 {
uint32_t tmp;
 
+   /* These registers are not accessible to VF-SRIOV.
+* The PF will program them instead.
+*/
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* Setup L2 cache */
tmp = RREG32_SOC15(GC, 0, mmGCVM_L2_CNTL);
tmp = REG_SET_FIELD(tmp, GCVM_L2_CNTL, ENABLE_L2_CACHE, 1);
@@ -237,6 +243,12 @@ static void gfxhub_v2_1_enable_system_domain(struct 
amdgpu_device *adev)
 
 static void gfxhub_v2_1_disable_identity_aperture(struct amdgpu_device *adev)
 {
+   /* These registers are not accessible to VF-SRIOV.
+* The PF will program them instead.
+*/
+   if (amdgpu_sriov_vf(adev))
+   return;
+
WREG32_SOC15(GC, 0, mmGCVM_L2_CONTEXT1_IDENTITY_APERTURE_LOW_ADDR_LO32,
 0x);
WREG32_SOC15(GC, 0, mmGCVM_L2_CONTEXT1_IDENTITY_APERTURE_LOW_ADDR_HI32,
@@ -373,6 +385,13 @@ void gfxhub_v2_1_set_fault_enable_default(struct 
amdgpu_device *adev,
  bool value)
 {
u32 tmp;
+
+   /* These registers are not accessible to VF-SRIOV.
+* The PF will program them instead.
+*/
+   if (amdgpu_sriov_vf(adev))
+   return;
+
tmp = RREG32_SOC15(GC, 0, mmGCVM_L2_PROTECTION_FAULT_CNTL);
tmp = REG_SET_FIELD(tmp, GCVM_L2_PROTECTION_FAULT_CNTL,
RANGE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c
index d83912901f73..8acb3b625afe 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c
@@ -181,6 +181,12 @@ static void mmhub_v2_0_init_cache_regs(struct 
amdgpu_device *adev)
 {
uint32_t tmp;
 
+   /* These registers are not accessible to VF-SRIOV.
+* The PF will program them instead.
+*/
+   if (amdgpu_sriov_vf(adev))
+   return;
+
/* Setup L2 cache */
tmp = RREG32_SOC15(MMHUB, 0, mmMMVM_L2_CNTL);
tmp = REG_SET_FIELD(tmp, MMVM_L2_CNTL, ENABLE_L2_CACHE, 1);
@@ -236,6 +242,12 @@ static void mmhub_v2_0_enable_system_domain(struct 
amdgpu_device *adev)
 
 static void mmhub_v2_0_disable_identity_aperture(struct amdgpu_device *adev)
 {
+   /* These registers are not accessible to VF-SRIOV.
+* The PF will program them instead.
+*/
+   if (amdgpu_sriov_vf(adev))
+   return;
+
WREG32_SOC15(MMHUB, 0,
 mmMMVM_L2_CONTEXT1_IDENTITY_APERTURE_LOW_ADDR_LO32,
 0x);
@@ -365,6 +377,13 @@ void mmhub_v2_0_gart_disable(struct amdgpu_device *adev)
 void mmhub_v2_0_set_fault_enable_default(struct amdgpu_device *adev, bool 
value)
 {
u32 tmp;
+
+   /* These registers are not accessible to VF-SRIOV.
+* The PF will program them instead.
+*/
+   if (amdgpu_sriov_vf(adev))
+   return;
+
tmp = RREG32_SOC15(MMHUB, 0, mmMMVM_L2_PROTECTION_FAULT_CNTL);
tmp = REG_SET_FIELD(tmp, MMVM_L2_PROTECTION_FAULT_CNTL,
RANGE_PROTECTION_FAULT_ENABLE_DEFAULT, value);
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 7/7] drm/amd/display: Replace DRM private objects with subclassed DRM atomic state

On Thu, Jul 30, 2020 at 04:36:42PM -0400, Nicholas Kazlauskas wrote:
> [Why]
> DM atomic check was structured in a way that we required old DC state
> in order to dynamically add and remove planes and streams from the
> context to build the DC state context for validation.
> 
> DRM private objects were used to carry over the last DC state and
> were added to the context on nearly every commit - regardless of fast
> or full so we could check whether or not the new state could affect
> bandwidth.
> 
> The problem with this model is that DRM private objects do not
> implicitly stall out other commits.
> 
> So if you have two commits touching separate DRM objects they could
> run concurrently and potentially execute out of order - leading to a
> use-after-free.
> 
> If we want this to be safe we have two options:
> 1. Stall out concurrent commits since they touch the same private object
> 2. Refactor DM to not require old DC state and drop private object usage
> 
> [How]
> This implements approach #2 since it still allows for judder free
> updates in multi-display scenarios.
> 
> I'll list the big changes in order at a high level:
> 
> 1. Subclass DRM atomic state instead of using DRM private objects.
> 
> DC relied on the old state to determine which changes cause bandwidth
> updates but now we have DM perform similar checks based on DRM state
> instead - dropping the requirement for old state to exist at all.
> 
> This means that we can now build a new DC context from scratch whenever
> we have something that DM thinks could affect bandwidth.
> 
> Whenever we need to rebuild bandwidth we now add all CRTCs and planes
> to the DRM state in order to get the absolute set of DC streams and
> DC planes.
> 
> This introduces a stall on other commits, but this stall already
> exists because of the lock_and_validation logic and it's necessary
> since updates may move around pipes and require full reprogramming.

Hm I think long term would be neat if you can first just add the dc
streams for the current planes (if you convert them individually to
private state objects), and ask DC to recompute specifics just for that.
If DC then says "yes I need an even bigger recompute" only then do you
grab all the streams and everything else.

The idea is that essentially you treat individual stream objects as
read-locks on that part of the overall global state, and only when you
need to do a write do you grab all the "read locks" to do the update.

But might not actually help for your hw. Just highlighting this as a
pattern we've done.
-Daniel


> 
> 2. Drop workarounds to add planes to maintain z-order early in atomic
> check. This is no longer needed because of the changes for (1).
> 
> This also involves fixing up should_plane_reset checks since we can just
> avoid resetting streams and planes when they haven't actually changed.
> 
> 3. Rework dm_update_crtc_state and dm_update_plane_state to be single
> pass instead of two pass.
> 
> This is necessary since we no longer have the dc_state to add and
> remove planes to the context in and we want to defer creation to the
> end of commit_check.
> 
> It also makes the logic a lot simpler to follow since as an added bonus.
> 
> Cc: Bhawanpreet Lakha 
> Cc: Harry Wentland 
> Cc: Leo Li 
> Cc: Daniel Vetter 
> Signed-off-by: Nicholas Kazlauskas 
> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 720 +++---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  11 +-
>  2 files changed, 280 insertions(+), 451 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 59829ec81694..97a7dfc620e8 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -1839,7 +1839,6 @@ static int dm_resume(void *handle)
>   struct drm_plane *plane;
>   struct drm_plane_state *new_plane_state;
>   struct dm_plane_state *dm_new_plane_state;
> - struct dm_atomic_state *dm_state = 
> to_dm_atomic_state(dm->atomic_obj.state);
>   enum dc_connection_type new_connection_type = dc_connection_none;
>   struct dc_state *dc_state;
>   int i, r, j;
> @@ -1879,11 +1878,6 @@ static int dm_resume(void *handle)
>  
>   return 0;
>   }
> - /* Recreate dc_state - DC invalidates it when setting power state to 
> S3. */
> - dc_release_state(dm_state->context);
> - dm_state->context = dc_create_state(dm->dc);
> - /* TODO: Remove dc_state->dccg, use dc->dccg directly. */
> - dc_resource_state_construct(dm->dc, dm_state->context);
>  
>   /* Before powering on DC we need to re-initialize DMUB. */
>   r = dm_dmub_hw_init(adev);
> @@ -2019,11 +2013,51 @@ const struct amdgpu_ip_block_version dm_ip_block =
>   * *WIP*
>   */
>  
> +struct drm_atomic_state *dm_atomic_state_alloc(struct drm_device *dev)
> +{
> + struct dm_atomic_state *dm_state;
> +
> + dm_state = kzalloc(sizeof(*dm_st

Re: [PATCH 5/7] drm/amd/display: Reset plane for anything that's not a FAST update

On Thu, Jul 30, 2020 at 04:36:40PM -0400, Nicholas Kazlauskas wrote:
> [Why]
> MEDIUM or FULL updates can require global validation or affect
> bandwidth. By treating these all simply as surface updates we aren't
> actually passing this through DC global validation.
> 
> [How]
> There's currently no way to pass surface updates through DC global
> validation, nor do I think it's a good idea to change the interface
> to accept these.
> 
> DC global validation itself is currently stateless, and we can move
> our update type checking to be stateless as well by duplicating DC
> surface checks in DM based on DRM properties.
> 
> We wanted to rely on DC automatically determining this since DC knows
> best, but DM is ultimately what fills in everything into DC plane
> state so it does need to know as well.
> 
> There are basically only three paths that we exercise in DM today:
> 
> 1) Cursor (async update)
> 2) Pageflip (fast update)
> 3) Full pipe programming (medium/full updates)
> 
> Which means that anything that's more than a pageflip really needs to
> go down path #3.
> 
> So this change duplicates all the surface update checks based on DRM
> state instead inside of should_reset_plane().
> 
> Next step is dropping dm_determine_update_type_for_commit and we no
> longer require the old DC state at all for global validation.

I think we do something similar in i915, where we have a "nothing except
base address changed" fast path, but for anything else we fully compute a
new state. Obviously you should try to keep global state synchronization
to a minimum for this step, so it's not entirely only 2 options.

Once we have the states, we compare them and figure out whether we can get
away with a fast modeset operation (maybe what you guys call medium
update). Anyway I think being slightly more aggressive with computing full
state, and then falling back to more optimized update again is a good
approach. Only risk is if we you have too much synchronization in your
locking (e.g. modern compositors do like to change tiling and stuff,
especially once you have modifiers enabled, so this shouldn't cause a sync
across crtc except when absolutely needed).
-Daniel

> 
> Optimization can come later so we don't reset DC planes at all for
> MEDIUM udpates and avoid validation, but we might require some extra
> checks in DM to achieve this.
> 
> Cc: Bhawanpreet Lakha 
> Cc: Hersen Wu 
> Signed-off-by: Nicholas Kazlauskas 
> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 25 +++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 0d5f45742bb5..2cbb29199e61 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -8336,6 +8336,31 @@ static bool should_reset_plane(struct drm_atomic_state 
> *state,
>   if (old_other_state->crtc != new_other_state->crtc)
>   return true;
>  
> + /* Src/dst size and scaling updates. */
> + if (old_other_state->src_w != new_other_state->src_w ||
> + old_other_state->src_h != new_other_state->src_h ||
> + old_other_state->crtc_w != new_other_state->crtc_w ||
> + old_other_state->crtc_h != new_other_state->crtc_h)
> + return true;
> +
> + /* Rotation / mirroring updates. */
> + if (old_other_state->rotation != new_other_state->rotation)
> + return true;
> +
> + /* Blending updates. */
> + if (old_other_state->pixel_blend_mode !=
> + new_other_state->pixel_blend_mode)
> + return true;
> +
> + /* Alpha updates. */
> + if (old_other_state->alpha != new_other_state->alpha)
> + return true;
> +
> + /* Colorspace changes. */
> + if (old_other_state->color_range != 
> new_other_state->color_range ||
> + old_other_state->color_encoding != 
> new_other_state->color_encoding)
> + return true;
> +
>   /* Framebuffer checks fall at the end. */
>   if (!old_other_state->fb || !new_other_state->fb)
>   continue;
> -- 
> 2.25.1
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/7] drm/amd/display: Avoid using unvalidated tiling_flags and tmz_surface in prepare_planes

On Thu, Jul 30, 2020 at 04:36:38PM -0400, Nicholas Kazlauskas wrote:
> [Why]
> We're racing with userspace as the flags could potentially change
> from when we acquired and validated them in commit_check.

Uh ... I didn't know these could change. I think my comments on Bas'
series are even more relevant now. I think long term would be best to bake
these flags in at addfb time when modifiers aren't set. And otherwise
always use the modifiers flag, and completely ignore the legacy flags
here.
-Daniel

> 
> [How]
> We unfortunately can't drop this function in its entirety from
> prepare_planes since we don't know the afb->address at commit_check
> time yet.
> 
> So instead of querying new tiling_flags and tmz_surface use the ones
> from the plane_state directly.
> 
> While we're at it, also update the force_disable_dcc option based
> on the state from atomic check.
> 
> Cc: Bhawanpreet Lakha 
> Cc: Rodrigo Siqueira 
> Signed-off-by: Nicholas Kazlauskas 
> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 36 ++-
>  1 file changed, 19 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index bf1881bd492c..f78c09c9585e 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -5794,14 +5794,8 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
> *plane,
>   struct list_head list;
>   struct ttm_validate_buffer tv;
>   struct ww_acquire_ctx ticket;
> - uint64_t tiling_flags;
>   uint32_t domain;
>   int r;
> - bool tmz_surface = false;
> - bool force_disable_dcc = false;
> -
> - dm_plane_state_old = to_dm_plane_state(plane->state);
> - dm_plane_state_new = to_dm_plane_state(new_state);
>  
>   if (!new_state->fb) {
>   DRM_DEBUG_DRIVER("No FB bound\n");
> @@ -5845,27 +5839,35 @@ static int dm_plane_helper_prepare_fb(struct 
> drm_plane *plane,
>   return r;
>   }
>  
> - amdgpu_bo_get_tiling_flags(rbo, &tiling_flags);
> -
> - tmz_surface = amdgpu_bo_encrypted(rbo);
> -
>   ttm_eu_backoff_reservation(&ticket, &list);
>  
>   afb->address = amdgpu_bo_gpu_offset(rbo);
>  
>   amdgpu_bo_ref(rbo);
>  
> + /**
> +  * We don't do surface updates on planes that have been newly created,
> +  * but we also don't have the afb->address during atomic check.
> +  *
> +  * Fill in buffer attributes depending on the address here, but only on
> +  * newly created planes since they're not being used by DC yet and this
> +  * won't modify global state.
> +  */
> + dm_plane_state_old = to_dm_plane_state(plane->state);
> + dm_plane_state_new = to_dm_plane_state(new_state);
> +
>   if (dm_plane_state_new->dc_state &&
> - dm_plane_state_old->dc_state != 
> dm_plane_state_new->dc_state) {
> - struct dc_plane_state *plane_state = 
> dm_plane_state_new->dc_state;
> + dm_plane_state_old->dc_state != dm_plane_state_new->dc_state) {
> + struct dc_plane_state *plane_state =
> + dm_plane_state_new->dc_state;
> + bool force_disable_dcc = !plane_state->dcc.enable;
>  
> - force_disable_dcc = adev->asic_type == CHIP_RAVEN && 
> adev->in_suspend;
>   fill_plane_buffer_attributes(
>   adev, afb, plane_state->format, plane_state->rotation,
> - tiling_flags, &plane_state->tiling_info,
> - &plane_state->plane_size, &plane_state->dcc,
> - &plane_state->address, tmz_surface,
> - force_disable_dcc);
> + dm_plane_state_new->tiling_flags,
> + &plane_state->tiling_info, &plane_state->plane_size,
> + &plane_state->dcc, &plane_state->address,
> + dm_plane_state_new->tmz_surface, force_disable_dcc);
>   }
>  
>   return 0;
> -- 
> 2.25.1
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 0/7] drm/amd/display: Drop DRM private objects from amdgpu_dm

On Thu, Jul 30, 2020 at 04:36:35PM -0400, Nicholas Kazlauskas wrote:
> Based on the analysis of the bug from [1] the best course of action seems
> to be swapping off of DRM private objects back to subclassing DRM atomic
> state instead.
> 
> This patch series implements this change, but not yet the other changes
> suggested in the threads from that bug - these will come later.
> 
> CCing dri-devel per Daniel's suggestion since this issue brought
> some interesting misuse of private objects.

I ended up reading around a bit, and it feels like the sub-objects might
make a reasonable private state structure perhaps. Like dc_stream_state,
at least when reading around in e.g. dc_remove_stream_from_ctx.

But would need to come up with a plan how to integrate this on the other
os side of DC I guess :-)

Anyway I'd say more evidence that dc_state needs to subclass
drm_atomic_state.

Another thing I wondered is whether we should rename drm_atomic_state to
drm_atomic_state_update, so it's clear it's the container with the updated
states, not a real state object thing itself.
-Daniel

> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=207383
> 
> Nicholas Kazlauskas (7):
>   drm/amd/display: Store tiling_flags and tmz_surface on dm_plane_state
>   drm/amd/display: Reset plane when tiling flags change
>   drm/amd/display: Avoid using unvalidated tiling_flags and tmz_surface
> in prepare_planes
>   drm/amd/display: Use validated tiling_flags and tmz_surface in
> commit_tail
>   drm/amd/display: Reset plane for anything that's not a FAST update
>   drm/amd/display: Drop dm_determine_update_type_for_commit
>   drm/amd/display: Replace DRM private objects with subclassed DRM
> atomic state
> 
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 967 ++
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  13 +-
>  2 files changed, 343 insertions(+), 637 deletions(-)
> 
> -- 
> 2.25.1
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdkfd: force raven as "dgpu" path

2020-08-07 Thread Huang Rui

We still have a few iommu issues which need to address, so force raven
as "dgpu" path for the moment.

Will enable IOMMUv2 since the issues are fixed.

Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c   | 6 ++
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 6a250f8fcfb8..66d9f7280fe8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 #include "kfd_crat.h"
 #include "kfd_priv.h"
 #include "kfd_topology.h"
@@ -781,6 +782,11 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
return -ENODATA;
}
 
+   /* Raven series will go with the fallback path to use virtual CRAT */
+   if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+   boot_cpu_data.x86 == 0x17)
+   return -ENODATA;
+
pcrat_image = kmemdup(crat_table, crat_table->length, GFP_KERNEL);
if (!pcrat_image)
return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index d5e790f046b4..93179c928371 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -116,6 +116,7 @@ static const struct kfd_device_info carrizo_device_info = {
.num_xgmi_sdma_engines = 0,
.num_sdma_queues_per_engine = 2,
 };
+#endif
 
 static const struct kfd_device_info raven_device_info = {
.asic_family = CHIP_RAVEN,
@@ -128,13 +129,12 @@ static const struct kfd_device_info raven_device_info = {
.num_of_watch_points = 4,
.mqd_size_aligned = MQD_SIZE_ALIGNED,
.supports_cwsr = true,
-   .needs_iommu_device = true,
+   .needs_iommu_device = false,
.needs_pci_atomics = true,
.num_sdma_engines = 1,
.num_xgmi_sdma_engines = 0,
.num_sdma_queues_per_engine = 2,
 };
-#endif
 
 static const struct kfd_device_info hawaii_device_info = {
.asic_family = CHIP_HAWAII,
-- 
2.25.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/7] drm/amd/display: Store tiling_flags and tmz_surface on dm_plane_state

On Thu, Jul 30, 2020 at 04:36:36PM -0400, Nicholas Kazlauskas wrote:
> [Why]
> Store these in advance so we can reuse them later in commit_tail without
> having to reserve the fbo again.
> 
> These will also be used for checking for tiling changes when deciding
> to reset the plane or not.

I've also dropped some comments on Bas' series for adding modifiers which
might be relevant for shuffling all this. But yeah stuff this into plane
state sounds like a good idea.
-Daniel

> 
> [How]
> This change should mostly be a refactor. Only commit check is affected
> for now and I'll drop the get_fb_info calls in prepare_planes and
> commit_tail after.
> 
> This runs a prepass loop once we think that all planes have been added
> to the context and replaces the get_fb_info calls with accessing the
> dm_plane_state instead.
> 
> Cc: Bhawanpreet Lakha 
> Cc: Rodrigo Siqueira 
> Signed-off-by: Nicholas Kazlauskas 
> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 60 +++
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +
>  2 files changed, 37 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 8d64f5fde7e2..7cc5ab90ce13 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -3700,8 +3700,17 @@ static int fill_dc_scaling_info(const struct 
> drm_plane_state *state,
>  static int get_fb_info(const struct amdgpu_framebuffer *amdgpu_fb,
>  uint64_t *tiling_flags, bool *tmz_surface)
>  {
> - struct amdgpu_bo *rbo = gem_to_amdgpu_bo(amdgpu_fb->base.obj[0]);
> - int r = amdgpu_bo_reserve(rbo, false);
> + struct amdgpu_bo *rbo;
> + int r;
> +
> + if (!amdgpu_fb) {
> + *tiling_flags = 0;
> + *tmz_surface = false;
> + return 0;
> + }
> +
> + rbo = gem_to_amdgpu_bo(amdgpu_fb->base.obj[0]);
> + r = amdgpu_bo_reserve(rbo, false);
>  
>   if (unlikely(r)) {
>   /* Don't show error message when returning -ERESTARTSYS */
> @@ -4124,13 +4133,10 @@ static int fill_dc_plane_attributes(struct 
> amdgpu_device *adev,
>   struct drm_crtc_state *crtc_state)
>  {
>   struct dm_crtc_state *dm_crtc_state = to_dm_crtc_state(crtc_state);
> - const struct amdgpu_framebuffer *amdgpu_fb =
> - to_amdgpu_framebuffer(plane_state->fb);
> + struct dm_plane_state *dm_plane_state = to_dm_plane_state(plane_state);
>   struct dc_scaling_info scaling_info;
>   struct dc_plane_info plane_info;
> - uint64_t tiling_flags;
>   int ret;
> - bool tmz_surface = false;
>   bool force_disable_dcc = false;
>  
>   ret = fill_dc_scaling_info(plane_state, &scaling_info);
> @@ -4142,15 +4148,12 @@ static int fill_dc_plane_attributes(struct 
> amdgpu_device *adev,
>   dc_plane_state->clip_rect = scaling_info.clip_rect;
>   dc_plane_state->scaling_quality = scaling_info.scaling_quality;
>  
> - ret = get_fb_info(amdgpu_fb, &tiling_flags, &tmz_surface);
> - if (ret)
> - return ret;
> -
>   force_disable_dcc = adev->asic_type == CHIP_RAVEN && adev->in_suspend;
> - ret = fill_dc_plane_info_and_addr(adev, plane_state, tiling_flags,
> + ret = fill_dc_plane_info_and_addr(adev, plane_state,
> +   dm_plane_state->tiling_flags,
> &plane_info,
> &dc_plane_state->address,
> -   tmz_surface,
> +   dm_plane_state->tmz_surface,
> force_disable_dcc);
>   if (ret)
>   return ret;
> @@ -5753,6 +5756,10 @@ dm_drm_plane_duplicate_state(struct drm_plane *plane)
>   dc_plane_state_retain(dm_plane_state->dc_state);
>   }
>  
> + /* Framebuffer hasn't been updated yet, so retain old flags. */
> + dm_plane_state->tiling_flags = old_dm_plane_state->tiling_flags;
> + dm_plane_state->tmz_surface = old_dm_plane_state->tmz_surface;
> +
>   return &dm_plane_state->base;
>  }
>  
> @@ -8557,13 +8564,9 @@ dm_determine_update_type_for_commit(struct 
> amdgpu_display_manager *dm,
>   continue;
>  
>   for_each_oldnew_plane_in_state(state, plane, old_plane_state, 
> new_plane_state, j) {
> - const struct amdgpu_framebuffer *amdgpu_fb =
> - to_amdgpu_framebuffer(new_plane_state->fb);
>   struct dc_plane_info *plane_info = 
> &bundle->plane_infos[num_plane];
>   struct dc_flip_addrs *flip_addr = 
> &bundle->flip_addrs[num_plane];
>   struct dc_scaling_info *scaling_info = 
> &bundle->scaling_infos[num_plane];
> - uint64_t tiling_flags;
> -

[PATCH v2] drm/amdgpu: annotate a false positive recursive locking

2020-08-07 Thread Dennis Li

[  584.110304] 
[  584.110590] WARNING: possible recursive locking detected
[  584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G   OE
[  584.64] 
[  584.111456] kworker/38:1/553 is trying to acquire lock:
[  584.111721] 9b15ff0a47a0 (&adev->reset_sem){}, at: 
amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.112112]
   but task is already holding lock:
[  584.112673] 9b1603d247a0 (&adev->reset_sem){}, at: 
amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.113068]
   other info that might help us debug this:
[  584.113689]  Possible unsafe locking scenario:

[  584.114350]CPU0
[  584.114685]
[  584.115014]   lock(&adev->reset_sem);
[  584.115349]   lock(&adev->reset_sem);
[  584.115678]
*** DEADLOCK ***

[  584.116624]  May be due to missing lock nesting notation

[  584.117284] 4 locks held by kworker/38:1/553:
[  584.117616]  #0: 9ad635c1d348 ((wq_completion)events){+.+.}, at: 
process_one_work+0x21f/0x630
[  584.117967]  #1: ac708e1c3e58 
((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630
[  584.118358]  #2: c1c2a5d0 (&tmp->hive_lock){+.+.}, at: 
amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu]
[  584.118786]  #3: 9b1603d247a0 (&adev->reset_sem){}, at: 
amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.119222]
   stack backtrace:
[  584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G 
  OE 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1
[  584.120782] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 
05/23/2019
[  584.121223] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
[  584.121638] Call Trace:
[  584.122050]  dump_stack+0x98/0xd5
[  584.122499]  __lock_acquire+0x1139/0x16e0
[  584.122931]  ? trace_hardirqs_on+0x3b/0xf0
[  584.123358]  ? cancel_delayed_work+0xa6/0xc0
[  584.123771]  lock_acquire+0xb8/0x1c0
[  584.124197]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.124599]  down_write+0x49/0x120
[  584.125032]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.125472]  amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
[  584.125910]  ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu]
[  584.126367]  amdgpu_ras_do_recovery+0x159/0x190 [amdgpu]
[  584.126789]  process_one_work+0x29e/0x630
[  584.127208]  worker_thread+0x3c/0x3f0
[  584.127621]  ? __kthread_parkme+0x61/0x90
[  584.128014]  kthread+0x12f/0x150
[  584.128402]  ? process_one_work+0x630/0x630
[  584.128790]  ? kthread_park+0x90/0x90
[  584.129174]  ret_from_fork+0x3a/0x50

Each adev has owned lock_class_key to avoid false positive
recursive locking.

v2:
1. register adev->lock_key into lockdep, otherwise lockdep will
report the below warning

[ 1216.705820] BUG: key 890183b647d0 has not been registered!
[ 1216.705924] [ cut here ]
[ 1216.705972] DEBUG_LOCKS_WARN_ON(1)
[ 1216.705997] WARNING: CPU: 20 PID: 541 at kernel/locking/lockdep.c:3743 
lockdep_init_map+0x150/0x210

Signed-off-by: Dennis Li 
Change-Id: I7571efeccbf15483982031d00504a353031a854a

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e97c088d03b3..766dc8f8c8a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -967,6 +967,7 @@ struct amdgpu_device {
atomic_tin_gpu_reset;
enum pp_mp1_state   mp1_state;
struct rw_semaphore reset_sem;
+   struct lock_class_key lock_key;
struct amdgpu_doorbell_index doorbell_index;
 
struct mutexnotifier_lock;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 62ecac97fbd2..ae0a576f9895 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3037,6 +3037,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
mutex_init(&adev->virt.vf_errors.lock);
hash_init(adev->mn_hash);
init_rwsem(&adev->reset_sem);
+   lockdep_register_key(&adev->lock_key);
+   lockdep_set_class(&adev->reset_sem, &adev->lock_key);
atomic_set(&adev->in_gpu_reset, 0);
mutex_init(&adev->psp.mutex);
mutex_init(&adev->notifier_lock);
@@ -3411,6 +3413,8 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
amdgpu_pmu_fini(adev);
if (adev->discovery_bin)
amdgpu_discovery_fini(adev);
+
+   lockdep_unregister_key(&adev->lock_key);
 }
 
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: annotate a false positive recursive locking