RE: [PATCH] drm/amd/pm: keep the BACO feature enabled for suspend

2021-12-30 Thread Chen, Guchun
[Public]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Quan, Evan  
Sent: Thursday, December 30, 2021 6:01 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Chen, Guchun 
; Quan, Evan 
Subject: [PATCH] drm/amd/pm: keep the BACO feature enabled for suspend

To pair with the workaround which always reset the ASIC in suspend.
Otherwise, the reset which relies on BACO will fail.

Fixes: 50583690930d ("drm/amdgpu: always reset the asic in suspend (v2)")

Signed-off-by: Evan Quan 
Change-Id: I39ed072af16e34ef1e1c16b50ace6d46fbc388b9
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 4d867778a65c..7628be2f2301 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -1308,10 +1308,16 @@ static int smu_disable_dpms(struct smu_context *smu)  {
struct amdgpu_device *adev = smu->adev;
int ret = 0;
+   /*
+* TODO: (adev->in_suspend && !adev->in_s0ix) is added to pair
+* the workaround which always reset the asic in suspend.
+* It's likely that workaround will be dropped in the future.
+* Then the change here should be dropped together.
+*/
bool use_baco = !smu->is_apu &&
((amdgpu_in_reset(adev) &&
  (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO)) ||
-((adev->in_runpm || adev->in_s4) && 
amdgpu_asic_supports_baco(adev)));
+((adev->in_runpm || adev->in_s4 || (adev->in_suspend && 
+!adev->in_s0ix)) && amdgpu_asic_supports_baco(adev)));
 
/*
 * For custom pptable uploading, skip the DPM features
--
2.29.0


Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2021-12-30 Thread Andrey Grodzovsky

Sure, I guess i can drop this patch then.

Andrey

On 2021-12-24 4:57 a.m., JingWen Chen wrote:

I do agree with shaoyun, if the host find the gpu engine hangs first, and do 
the flr, guest side thread may not know this and still try to access HW(e.g. 
kfd is using a lot of amdgpu_in_reset and reset_sem to identify the reset 
status). And this may lead to very bad result.

On 2021/12/24 下午4:58, Deng, Emily wrote:

These patches look good to me. JingWen will pull these patches and do some 
basic TDR test on sriov environment, and give feedback.

Best wishes
Emily Deng




-Original Message-
From: Liu, Monk 
Sent: Thursday, December 23, 2021 6:14 PM
To: Koenig, Christian ; Grodzovsky, Andrey
; dri-de...@lists.freedesktop.org; amd-
g...@lists.freedesktop.org; Chen, Horace ; Chen,
JingWen ; Deng, Emily 
Cc: dan...@ffwll.ch
Subject: RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
for SRIOV

[AMD Official Use Only]

@Chen, Horace @Chen, JingWen @Deng, Emily

Please take a review on Andrey's patch

Thanks
---
Monk Liu | Cloud GPU & Virtualization Solution | AMD
---
we are hiring software manager for CVS core team
---

-Original Message-
From: Koenig, Christian 
Sent: Thursday, December 23, 2021 4:42 PM
To: Grodzovsky, Andrey ; dri-
de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
Cc: dan...@ffwll.ch; Liu, Monk ; Chen, Horace

Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
for SRIOV

Am 22.12.21 um 23:14 schrieb Andrey Grodzovsky:

Since now flr work is serialized against  GPU resets there is no need
for this.

Signed-off-by: Andrey Grodzovsky 

Acked-by: Christian König 


---
   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 ---
   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 ---
   2 files changed, 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 487cd654b69e..7d59a66e3988 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct

work_struct *work)

struct amdgpu_device *adev = container_of(virt, struct

amdgpu_device, virt);

int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;

-   /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
-* otherwise the mailbox msg will be ruined/reseted by
-* the VF FLR.
-*/
-   if (!down_write_trylock(>reset_sem))
-   return;
-
amdgpu_virt_fini_data_exchange(adev);
-   atomic_set(>in_gpu_reset, 1);

xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);

@@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct

work_struct *work)

} while (timeout > 1);

   flr_done:
-   atomic_set(>in_gpu_reset, 0);
-   up_write(>reset_sem);
-
/* Trigger recovery for world switch failure if no TDR */
if (amdgpu_device_should_recover_gpu(adev)
&& (!amdgpu_device_has_job_running(adev) || diff --git
a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index e3869067a31d..f82c066c8e8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct

work_struct *work)

struct amdgpu_device *adev = container_of(virt, struct

amdgpu_device, virt);

int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;

-   /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
-* otherwise the mailbox msg will be ruined/reseted by
-* the VF FLR.
-*/
-   if (!down_write_trylock(>reset_sem))
-   return;
-
amdgpu_virt_fini_data_exchange(adev);
-   atomic_set(>in_gpu_reset, 1);

xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);

@@ -298,9 +290,6 @@ static void xgpu_nv_mailbox_flr_work(struct

work_struct *work)

} while (timeout > 1);

   flr_done:
-   atomic_set(>in_gpu_reset, 0);
-   up_write(>reset_sem);
-
/* Trigger recovery for world switch failure if no TDR */
if (amdgpu_device_should_recover_gpu(adev)
&& (!amdgpu_device_has_job_running(adev) ||


Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2021-12-30 Thread Andrey Grodzovsky

Thanks a lot, please let me know.

Andrey

On 2021-12-24 3:58 a.m., Deng, Emily wrote:

These patches look good to me. JingWen will pull these patches and do some 
basic TDR test on sriov environment, and give feedback.

Best wishes
Emily Deng




-Original Message-
From: Liu, Monk 
Sent: Thursday, December 23, 2021 6:14 PM
To: Koenig, Christian ; Grodzovsky, Andrey
; dri-de...@lists.freedesktop.org; amd-
g...@lists.freedesktop.org; Chen, Horace ; Chen,
JingWen ; Deng, Emily 
Cc: dan...@ffwll.ch
Subject: RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
for SRIOV

[AMD Official Use Only]

@Chen, Horace @Chen, JingWen @Deng, Emily

Please take a review on Andrey's patch

Thanks
---
Monk Liu | Cloud GPU & Virtualization Solution | AMD
---
we are hiring software manager for CVS core team
---

-Original Message-
From: Koenig, Christian 
Sent: Thursday, December 23, 2021 4:42 PM
To: Grodzovsky, Andrey ; dri-
de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
Cc: dan...@ffwll.ch; Liu, Monk ; Chen, Horace

Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
for SRIOV

Am 22.12.21 um 23:14 schrieb Andrey Grodzovsky:

Since now flr work is serialized against  GPU resets there is no need
for this.

Signed-off-by: Andrey Grodzovsky 

Acked-by: Christian König 


---
   drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 ---
   drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 ---
   2 files changed, 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 487cd654b69e..7d59a66e3988 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct

work_struct *work)

struct amdgpu_device *adev = container_of(virt, struct

amdgpu_device, virt);

int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;

-   /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
-* otherwise the mailbox msg will be ruined/reseted by
-* the VF FLR.
-*/
-   if (!down_write_trylock(>reset_sem))
-   return;
-
amdgpu_virt_fini_data_exchange(adev);
-   atomic_set(>in_gpu_reset, 1);

xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);

@@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct

work_struct *work)

} while (timeout > 1);

   flr_done:
-   atomic_set(>in_gpu_reset, 0);
-   up_write(>reset_sem);
-
/* Trigger recovery for world switch failure if no TDR */
if (amdgpu_device_should_recover_gpu(adev)
&& (!amdgpu_device_has_job_running(adev) || diff --git
a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index e3869067a31d..f82c066c8e8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct

work_struct *work)

struct amdgpu_device *adev = container_of(virt, struct

amdgpu_device, virt);

int timeout = NV_MAILBOX_POLL_FLR_TIMEDOUT;

-   /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
-* otherwise the mailbox msg will be ruined/reseted by
-* the VF FLR.
-*/
-   if (!down_write_trylock(>reset_sem))
-   return;
-
amdgpu_virt_fini_data_exchange(adev);
-   atomic_set(>in_gpu_reset, 1);

xgpu_nv_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);

@@ -298,9 +290,6 @@ static void xgpu_nv_mailbox_flr_work(struct

work_struct *work)

} while (timeout > 1);

   flr_done:
-   atomic_set(>in_gpu_reset, 0);
-   up_write(>reset_sem);
-
/* Trigger recovery for world switch failure if no TDR */
if (amdgpu_device_should_recover_gpu(adev)
&& (!amdgpu_device_has_job_running(adev) ||


Re: [PATCH] gpu/drm/radeon:Fix null pointer risk

2021-12-30 Thread Christian König

Am 28.12.21 um 08:31 schrieb Wen Zhiwei:

If the null pointer is not judged in advance,
there is a risk that the pointer will cross
the boundary


As far as I can see that case is impossible, why do you want to add a 
check for it?


Regards,
Christian.



Signed-off-by: Wen Zhiwei 
---
  drivers/gpu/drm/radeon/radeon_vm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_vm.c 
b/drivers/gpu/drm/radeon/radeon_vm.c
index bb53016f3138..d3d342041adf 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -951,7 +951,7 @@ int radeon_vm_bo_update(struct radeon_device *rdev,
  
  		if (mem->mem_type == TTM_PL_TT) {

bo_va->flags |= RADEON_VM_PAGE_SYSTEM;
-   if (!(bo_va->bo->flags & (RADEON_GEM_GTT_WC | 
RADEON_GEM_GTT_UC)))
+   if (bo_va->bo && !(bo_va->bo->flags & 
(RADEON_GEM_GTT_WC | RADEON_GEM_GTT_UC)))
bo_va->flags |= RADEON_VM_PAGE_SNOOPED;
  
  		} else {




Re: [PATCH] drm/radeon: use kernel is_power_of_2 rather than local version

2021-12-30 Thread Christian König

Am 30.12.21 um 06:00 schrieb Jonathan Gray:

Follow the amdgpu change made in
7611750784664db46d0db95631e322aeb263dde7
and replace local radeon function with is_power_of_2().

Signed-off-by: Jonathan Gray 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/radeon/radeon_device.c | 19 +++
  1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 4f0fbf667431..15692cb241fc 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1085,19 +1085,6 @@ static unsigned int radeon_vga_set_decode(struct pci_dev 
*pdev, bool state)
return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM;
  }
  
-/**

- * radeon_check_pot_argument - check that argument is a power of two
- *
- * @arg: value to check
- *
- * Validates that a certain argument is a power of two (all asics).
- * Returns true if argument is valid.
- */
-static bool radeon_check_pot_argument(int arg)
-{
-   return (arg & (arg - 1)) == 0;
-}
-
  /**
   * radeon_gart_size_auto - Determine a sensible default GART size
   * according to ASIC family.
@@ -1126,7 +1113,7 @@ static int radeon_gart_size_auto(enum radeon_family 
family)
  static void radeon_check_arguments(struct radeon_device *rdev)
  {
/* vramlimit must be a power of two */
-   if (!radeon_check_pot_argument(radeon_vram_limit)) {
+   if (!is_power_of_2(radeon_vram_limit)) {
dev_warn(rdev->dev, "vram limit (%d) must be a power of 2\n",
radeon_vram_limit);
radeon_vram_limit = 0;
@@ -1140,7 +1127,7 @@ static void radeon_check_arguments(struct radeon_device 
*rdev)
dev_warn(rdev->dev, "gart size (%d) too small\n",
radeon_gart_size);
radeon_gart_size = radeon_gart_size_auto(rdev->family);
-   } else if (!radeon_check_pot_argument(radeon_gart_size)) {
+   } else if (!is_power_of_2(radeon_gart_size)) {
dev_warn(rdev->dev, "gart size (%d) must be a power of 2\n",
radeon_gart_size);
radeon_gart_size = radeon_gart_size_auto(rdev->family);
@@ -1163,7 +1150,7 @@ static void radeon_check_arguments(struct radeon_device 
*rdev)
break;
}
  
-	if (!radeon_check_pot_argument(radeon_vm_size)) {

+   if (!is_power_of_2(radeon_vm_size)) {
dev_warn(rdev->dev, "VM size (%d) must be a power of 2\n",
 radeon_vm_size);
radeon_vm_size = 4;




[pull] amdgpu, amdkfd drm-next-5.17

2021-12-30 Thread Alex Deucher
Hi Dave, Daniel,

Fixes for 5.17.  Now with more S-o-b.

The following changes since commit a342655865b2f14d1fbf346356d3b3360e63e872:

  drm/radeon: Fix syntax errors in comments (2021-12-14 16:11:02 -0500)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-next-5.17-2021-12-30

for you to fetch changes up to 0637d41786a3a9551f33ad8e15bdb40416362028:

  drm/amdgpu: no DC support for headless chips (2021-12-30 08:54:45 -0500)


amd-drm-next-5.17-2021-12-30:

amdgpu:
- Suspend/resume fixes
- Fence fix
- Misc code cleanups
- IP discovery fixes
- SRIOV fixes
- RAS fixes
- GMC 8 VRAM detection fix
- FRU fixes for Aldebaran
- Display fixes

amdkfd:
- SVM fixes
- IP discovery fixes


Alex Deucher (5):
  drm/amdgpu: clean up some leftovers from bring up
  drm/amdgpu: add support for IP discovery gc_info table v2
  drm/amdgpu: fix runpm documentation
  drm/amdgpu: always reset the asic in suspend (v2)
  drm/amdgpu: no DC support for headless chips

Alvin Lee (1):
  drm/amd/display: Fix check for null function ptr

Angus Wang (1):
  drm/amd/display: Changed pipe split policy to allow for multi-display 
pipe split

Anthony Koo (1):
  drm/amd/display: [FW Promotion] Release 0.0.98

Aric Cyr (1):
  drm/amd/display: 3.2.167

Bokun Zhang (1):
  drm/amdgpu: Filter security violation registers

Changcheng Deng (1):
  drm/amdkfd: use max() and min() to make code cleaner

Charlene Liu (1):
  drm/amd/display: fix B0 TMDS deepcolor no dislay issue

Evan Quan (1):
  drm/amdgpu: put SMU into proper state on runpm suspending for BOCO 
capable platform

George Shen (2):
  drm/amd/display: Limit max link cap with LTTPR caps
  drm/amd/display: Remove CR AUX RD Interval limit for LTTPR

Guchun Chen (2):
  drm/amdkfd: correct sdma queue number in kfd device init (v3)
  drm/amdgpu: drop redundant semicolon

Huang Rui (1):
  drm/amdgpu: introduce new amdgpu_fence object to indicate the job 
embedded fence

Jiapeng Chong (1):
  drm/amd/display: Fix warning comparing pointer to 0

José Expósito (1):
  drm/amd/display: fix dereference before NULL check

Kent Russell (4):
  drm/amdgpu: Increase potential product_name to 64 characters
  drm/amdgpu: Enable unique_id for Aldebaran
  drm/amdgpu: Only overwrite serial if field is empty
  drm/amdgpu: Access the FRU on Aldebaran

Lai, Derek (1):
  drm/amd/display: Added power down for DCN10

Leslie Shi (1):
  drm/amdgpu: Call amdgpu_device_unmap_mmio() if device is unplugged to 
prevent crash in GPU initialization failure

Lijo Lazar (1):
  drm/amd/pm: Fix xgmi link control on aldebaran

Marina Nikolic (1):
  amdgpu/pm: Make sysfs pm attributes as read-only for VFs

Mario Limonciello (2):
  drivers/amd/pm: smu13: use local variable adev
  drm/amd/pm: restore SMU version print statement for dGPUs

Martin Leung (1):
  drm/amd/display: Undo ODM combine

Nicholas Kazlauskas (4):
  drm/amd/display: Fix USB4 null pointer dereference in 
update_psp_stream_config
  drm/amd/display: Block z-states when stutter period exceeds criteria
  drm/amd/display: Send s0i2_rdy in stream_count == 0 optimization
  drm/amd/display: Set optimize_pwr_state for DCN31

Philip Yang (1):
  drm/amdkfd: fix svm_bo release invalid wait context warning

Prike Liang (1):
  drm/amd/pm: skip setting gfx cgpg in the s0ix suspend-resume

Rajneesh Bhardwaj (1):
  drm/amdgpu: Don't inherit GEM object VMAs in child process

Shen, George (1):
  drm/amd/display: Refactor vendor specific link training sequence

Surbhi Kakarya (1):
  drm/amdgpu: Check the memory can be accesssed by 
ttm_device_clear_dma_mappings.

Tao Zhou (5):
  drm/amdgpu: add gpu reset control for umc page retirement
  drm/amdkfd: add reset parameter for unmap queues
  drm/amdkfd: add reset queue function for RAS poison (v2)
  drm/amdkfd: reset queue which consumes RAS poison (v2)
  drm/amdgpu: save error count in RAS poison handler

Victor Skvortsov (6):
  drm/amdgpu: Separate vf2pf work item init from virt data exchange
  drm/amdgpu: Add *_SOC15_IP_NO_KIQ() macro definitions
  drm/amdgpu: Modify indirect register access for gmc_v9_0 sriov
  drm/amdgpu: Modify indirect register access for amdkfd_gfx_v9 sriov
  drm/amdgpu: get xgmi info before ip_init
  drm/amdgpu: Modify indirect register access for gfx9 sriov

Wenjing Liu (5):
  drm/amd/display: define link res and make it accessible to all link 
interfaces
  drm/amd/display: populate link res in both detection and validation
  drm/amd/display: access hpo dp link encoder only through link resource
  drm/amd/display: support dynamic HPO DP link encoder allocation
  drm/amd/display: get and restore link res map

Wesley 

Re: [PATCH] drm/amd/pm: keep the BACO feature enabled for suspend

2021-12-30 Thread Alex Deucher
Reviewed-by: Alex Deucher 

On Thu, Dec 30, 2021 at 5:01 AM Evan Quan  wrote:
>
> To pair with the workaround which always reset the ASIC in suspend.
> Otherwise, the reset which relies on BACO will fail.
>
> Fixes: 50583690930d ("drm/amdgpu: always reset the asic in suspend (v2)")
>
> Signed-off-by: Evan Quan 
> Change-Id: I39ed072af16e34ef1e1c16b50ace6d46fbc388b9
> ---
>  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> index 4d867778a65c..7628be2f2301 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> @@ -1308,10 +1308,16 @@ static int smu_disable_dpms(struct smu_context *smu)
>  {
> struct amdgpu_device *adev = smu->adev;
> int ret = 0;
> +   /*
> +* TODO: (adev->in_suspend && !adev->in_s0ix) is added to pair
> +* the workaround which always reset the asic in suspend.
> +* It's likely that workaround will be dropped in the future.
> +* Then the change here should be dropped together.
> +*/
> bool use_baco = !smu->is_apu &&
> ((amdgpu_in_reset(adev) &&
>   (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO)) 
> ||
> -((adev->in_runpm || adev->in_s4) && 
> amdgpu_asic_supports_baco(adev)));
> +((adev->in_runpm || adev->in_s4 || (adev->in_suspend && 
> !adev->in_s0ix)) && amdgpu_asic_supports_baco(adev)));
>
> /*
>  * For custom pptable uploading, skip the DPM features
> --
> 2.29.0
>


Re: [pull] amdgpu drm-fixes-5.16

2021-12-30 Thread Alex Deucher
On Thu, Dec 30, 2021 at 12:29 AM Dave Airlie  wrote:
>
> On Thu, 30 Dec 2021 at 01:51, Alex Deucher  wrote:
> >
> > Hi Dave, Daniel,
>
> Just FYI on merging this into tip I got a conflict I'm not sure what
> answer is right.
>
> fixes has:
> ee2698cf79cc759a397c61086c758d4cc85938bf
> Author: Angus Wang 
> Date:   Thu Dec 9 17:27:01 2021 -0500
>
> drm/amd/display: Changed pipe split policy to allow for
> multi-display pipe split
>
> next has:
> 1edf5ae1fdaffb67c1b93e98df670cbe535d13cf
> Author: Zhan Liu 
> Date:   Mon Nov 8 19:31:00 2021 -0500
>
> drm/amd/display: enable seamless boot for DCN301
>
> -.pipe_split_policy = MPC_SPLIT_AVOID_MULT_DISP,
> fixes is +.pipe_split_policy = MPC_SPLIT_DYNAMIC,
> next is +.pipe_split_policy = MPC_SPLIT_AVOID,
>
> I've chosen the -fixes answer for now, but it would be good to have
> someone review it before Linus merges.

It should ultimately be MPC_SPLIT_DYNAMIC.  -next has an extra patch
which changes it to an intermediate value before this patch changes it
to MPC_SPLIT_DYNAMIC.

Alex


>
> Dave.


[PATCH] drm/amd/pm: keep the BACO feature enabled for suspend

2021-12-30 Thread Evan Quan
To pair with the workaround which always reset the ASIC in suspend.
Otherwise, the reset which relies on BACO will fail.

Fixes: 50583690930d ("drm/amdgpu: always reset the asic in suspend (v2)")

Signed-off-by: Evan Quan 
Change-Id: I39ed072af16e34ef1e1c16b50ace6d46fbc388b9
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 4d867778a65c..7628be2f2301 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -1308,10 +1308,16 @@ static int smu_disable_dpms(struct smu_context *smu)
 {
struct amdgpu_device *adev = smu->adev;
int ret = 0;
+   /*
+* TODO: (adev->in_suspend && !adev->in_s0ix) is added to pair
+* the workaround which always reset the asic in suspend.
+* It's likely that workaround will be dropped in the future.
+* Then the change here should be dropped together.
+*/
bool use_baco = !smu->is_apu &&
((amdgpu_in_reset(adev) &&
  (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO)) ||
-((adev->in_runpm || adev->in_s4) && 
amdgpu_asic_supports_baco(adev)));
+((adev->in_runpm || adev->in_s4 || (adev->in_suspend && 
!adev->in_s0ix)) && amdgpu_asic_supports_baco(adev)));
 
/*
 * For custom pptable uploading, skip the DPM features
-- 
2.29.0



Re: [PATCH] drm/amdgpu: add dummy event6 for vega10

2021-12-30 Thread JingWen Chen
Reviewed-by: Jingwen Chen 

On 2021/12/29 下午6:38, James Yao wrote:
> [why]
> Malicious mailbox event1 fails driver loading on vega10.
> An dummy event6 prevent driver from taking response from malicious event1 as 
> its own.
>
> [how]
> On vega10, send a mailbox event6 before sending event1.
>
> Signed-off-by: James Yao 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |  4 
>  drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c| 11 +++
>  drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h|  2 ++
>  3 files changed, 17 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index f8e574cc0e22..d9509c3482e2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -727,6 +727,10 @@ void amdgpu_detect_virtualization(struct amdgpu_device 
> *adev)
>   vi_set_virt_ops(adev);
>   break;
>   case CHIP_VEGA10:
> + soc15_set_virt_ops(adev);
> + /* send a dummy GPU_INIT_DATA request to host on vega10 
> */
> + amdgpu_virt_request_init_data(adev);
> + break;
>   case CHIP_VEGA20:
>   case CHIP_ARCTURUS:
>   case CHIP_ALDEBARAN:
> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> index 0077e738db31..56da5ab82987 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
> @@ -180,6 +180,11 @@ static int xgpu_ai_send_access_requests(struct 
> amdgpu_device *adev,
>   RREG32_NO_KIQ(SOC15_REG_OFFSET(NBIO, 0,
>   mmBIF_BX_PF0_MAILBOX_MSGBUF_RCV_DW2));
>   }
> + } else if (req == IDH_REQ_GPU_INIT_DATA){
> + /* Dummy REQ_GPU_INIT_DATA handling */
> + r = xgpu_ai_poll_msg(adev, IDH_REQ_GPU_INIT_DATA_READY);
> + /* version set to 0 since dummy */
> + adev->virt.req_init_data_ver = 0;   
>   }
>  
>   return 0;
> @@ -381,10 +386,16 @@ void xgpu_ai_mailbox_put_irq(struct amdgpu_device *adev)
>   amdgpu_irq_put(adev, >virt.rcv_irq, 0);
>  }
>  
> +static int xgpu_ai_request_init_data(struct amdgpu_device *adev)
> +{
> + return xgpu_ai_send_access_requests(adev, IDH_REQ_GPU_INIT_DATA);
> +}
> +
>  const struct amdgpu_virt_ops xgpu_ai_virt_ops = {
>   .req_full_gpu   = xgpu_ai_request_full_gpu_access,
>   .rel_full_gpu   = xgpu_ai_release_full_gpu_access,
>   .reset_gpu = xgpu_ai_request_reset,
>   .wait_reset = NULL,
>   .trans_msg = xgpu_ai_mailbox_trans_msg,
> + .req_init_data  = xgpu_ai_request_init_data,
>  };
> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h 
> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
> index f9aa4d0bb638..fa7e13e0459e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h
> @@ -35,6 +35,7 @@ enum idh_request {
>   IDH_REQ_GPU_FINI_ACCESS,
>   IDH_REL_GPU_FINI_ACCESS,
>   IDH_REQ_GPU_RESET_ACCESS,
> + IDH_REQ_GPU_INIT_DATA,
>  
>   IDH_LOG_VF_ERROR   = 200,
>   IDH_READY_TO_RESET  = 201,
> @@ -48,6 +49,7 @@ enum idh_event {
>   IDH_SUCCESS,
>   IDH_FAIL,
>   IDH_QUERY_ALIVE,
> + IDH_REQ_GPU_INIT_DATA_READY,
>  
>   IDH_TEXT_MESSAGE = 255,
>  };