date:20220411

AMD Display Core (DC) patches (was: [PATCH 13/16] drm/amd/display: Revert FEC check in validation)

2022-04-11 Thread Paul Menzel

[Cc: +dri-de...@lists.freedesktop.org, +Daniel Vetter, +Alexander 
Deucher, +Greg KH]



Dear Alex,


I am a little confused and upset about how Display Core patches are 
handled in the Linux kernel.



Am 25.03.22 um 23:53 schrieb Alex Hung:

From: Martin Leung 


git puts a line “This reverts commit …” into the commit message, when 
something is reverted. Why isn’t this here? Right now, commit 
7d56a154e22f, reverted here, is proposed for the stable series. I guess, 
because these indicators and meta data are missing.



why and how:
causes failure on install on certain machines


Why are such kind of commit messages accepted? What does “failure on 
install” even mean? Why can’t the machine configuration be documented so 
it can be reproduced, when necessary.


No less confusing, the date you posted it on amd-gfx is from March 25th, 
2022, but the author date of the commit in agd5f/amd-staging-drm-next is 
`Fri Mar 18 11:12:36 2022 -0400`. Why is the patch missing the Date 
field then?



Reviewed-by: George Shen 
Acked-by: Alex Hung 
Signed-off-by: Martin Leung 


Shouldn’t the Signed-off-by line by the author go first?

You committed this on `Mon Mar 28 08:26:48 2022 -0600`, while you posted 
the patch on amd-gfx on Friday. How should *proper* review happen over 
the weekend?



---
  drivers/gpu/drm/amd/display/dc/core/dc.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index f2ad8f58e69c..c436db416708 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1496,10 +1496,6 @@ bool dc_validate_boot_timing(const struct dc *dc,
if (!link->link_enc->funcs->is_dig_enabled(link->link_enc))
return false;
  
-	/* Check for FEC status*/

-   if (link->link_enc->funcs->fec_is_active(link->link_enc))
-   return false;
-
enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc);
  
  	if (enc_inst == ENGINE_ID_UNKNOWN)


The patch reverted here, also lacked proper review, had a to-be desired 
commit message, did not follow the Linux kernel coding style (missing 
space before the comment terminator), so should not have been committed 
in the first place.


Seeing how many people are in the Cc list, I would have hoped, that 
somebody noticed and commented. The current state also makes it really 
hard for non-AMD employees to get the necessary information to do proper 
reviews as the needed documentation and information is non-public. So 
good/excellent commit messages are a must. I think to remember, you 
replied to me once, that Display Core patches are shared also with the 
Microsoft Windows driver, restricting the workflow options. But I think 
the issues I mentioned are unrelated. I know graphics hardware is very 
complex, but if quality of the commits and review would be improved, 
hopefully it saves time for everyone in the end, as less bugs are 
introduced.


Could AMD team please address these issues as soon as possible?


Kind regards,

Paul

Re: [PATCH 04/13] drm/amd/display: FEC check in timing validation

2022-04-11 Thread Paul Menzel


Dear Alex,


Am 19.03.22 um 08:43 schrieb Paul Menzel:

Dear Alex, dear Chiawen,


Thank you for your patch.

Am 18.03.22 um 22:47 schrieb Alex Hung:

From: Chiawen Huang 

[Why]
disable/enable leads fec mismatch between hw/sw fec state.


1.  Disable/enable of what?
2.  How can this be reproduced?
3.  s/fec/FEC/


[How]
check fec status to fastboot on/off.


What do you mean by “to fastboot on/off”?


Reviewed-by: Anthony Koo 
Acked-by: Alex Hung 
Signed-off-by: Chiawen Huang 
---
  drivers/gpu/drm/amd/display/dc/core/dc.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c

index f6e19efea756..75f9c97bebb0 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1496,6 +1496,10 @@ bool dc_validate_boot_timing(const struct dc *dc,
  if (!link->link_enc->funcs->is_dig_enabled(link->link_enc))
  return false;
+    /* Check for FEC status*/


Missing space before `*/`.


+    if (link->link_enc->funcs->fec_is_active(link->link_enc))
+    return false;
+
  enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc);
  if (enc_inst == ENGINE_ID_UNKNOWN)


I just saw, that this patch was committed on March 25th, 2022 (commit 
7d56a154e22f) with my comments ignored. Could you please explain why?



Kind regards,

Paul

Re: Vega 56 failing to process EDID from VR Headset

2022-04-11 Thread Paul Menzel


Dear James,


Am 11.04.22 um 23:39 schrieb James Dutton:


I have an Oculus Rift S, that I am trying to get working in Linux.


Please always mention the Linux kernel version.


I have an AMD Vega 56  graphics card.

The VR headset plugs into a display port of the Vega56.
The amdgpu driver sees the connection, and tries to process it.
The problem is it cannot process the EDID, so fails to recognise the
VR headset, and the VR headset does not work as a result.
Please find the EDID below.
I am guessing that the following is causing the problem:
Established Timings I & II: none
Standard Timings: none
Forcing the driver to understand the Detailed mode, to which it is failing.
If it helps, when attached to windows 10, it uses 1440x2560, portrait mode.
Some dmesg lines that may be useful:
// We should pick 1440x2560 as Windows picks that, but for some reason
is rejects it with error 10.
[10402.650734] [drm:create_validate_stream_for_sink [amdgpu]] Mode
1440x2560 (clk 571570) failed DC validation with error 10 (No DP link
bandwidth)
[10402.650991] [drm:update_stream_scaling_settings [amdgpu]]
Destination Rectangle x:0  y:0  width:1440  height:2560
[10402.651225] [drm:create_validate_stream_for_sink [amdgpu]] Mode
1440x2560 (clk 571570) failed DC validation with error 10 (No DP link
bandwidth)


(Please use an email program, which does not wrap lines after 72 
characters.)



Can anyone help give me with some pointers as to how to get the amdgpu
driver to accept this EDID?


So, did you do any changes to Linux? Why do you think the EDID is at fault?

[…]

I suggest to analyze, why `No DP link bandwidth` is logged. The macro is 
`DC_NO_DP_LINK_BANDWIDTH`, and you should first check why 
`dp_validate_mode_timing()` in 
`drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c` returns false.



Kind regards,

Paul


PS: Using the issue tracker [1] might make it easier to keep track of 
this problem, and also to attach all the necessary information.



[1]: https://gitlab.freedesktop.org/drm/amd/-/issues/

Re: [PATCH] drm/amdkfd: shrink bitmap size in struct svm_validate_context

2022-04-11 Thread Felix Kuehling


Am 2022-04-11 um 23:06 schrieb Lang Yu:

A MAX_GPU_INSTANCE bits bitmap will suffice.

Signed-off-by: Lang Yu 


Thanks for catching that. The patch is

Reviewed-by: Felix Kuehling 



---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 8b6adc142e25..459fa07a3bcc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1370,7 +1370,7 @@ struct svm_validate_context {
struct kfd_process *process;
struct svm_range *prange;
bool intr;
-   unsigned long bitmap[MAX_GPU_INSTANCE];
+   DECLARE_BITMAP(bitmap, MAX_GPU_INSTANCE);
struct ttm_validate_buffer tv[MAX_GPU_INSTANCE];
struct list_head validate_list;
struct ww_acquire_ctx ticket;

Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg

2022-04-11 Thread Lazar, Lijo





On 4/12/2022 9:38 AM, Darren Powell wrote:

Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
cases exist where messages are silently dropped with no error returned
to the caller. These cases occur in unusual situations where either:
  1. the message target is a virtual GPU, or


This is not fully correct - only messages which are not valid for 
virtual GPU are dropped, not all.


Thanks,
Lijo


  2. a PCI recovery is underway and the HW is not yet in sync with the SW

For more details see
  commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
  commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW 
state")

(v2)
   Reworked with suggestions from Luben & Paul

Signed-off-by: Darren Powell 
---
  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index b8d0c70ff668..8008ae5508e6 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
   * completion of the command, and return back a value from the SMU in
   * @read_arg pointer.
   *
- * Return 0 on success, -errno on error, if we weren't able to send
- * the message or if the message completed with some kind of
- * error. See __smu_cmn_reg2errno() for details of the -errno.
+ * Return 0 on success, -errno when a problem is encountered sending
+ * message or receiving reply. If there is a PCI bus recovery or
+ * the destination is a virtual GPU, the message is simply dropped and
+ * success is also returned.
+ * See __smu_cmn_reg2errno() for details of the -errno.
   *
   * If we weren't able to send the message to the SMU, we also print
   * the error to the standard log.

base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

[PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg

2022-04-11 Thread Darren Powell

Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
cases exist where messages are silently dropped with no error returned
to the caller. These cases occur in unusual situations where either:
 1. the message target is a virtual GPU, or
 2. a PCI recovery is underway and the HW is not yet in sync with the SW

For more details see
 commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
 commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")

(v2)
  Reworked with suggestions from Luben & Paul

Signed-off-by: Darren Powell 
---
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index b8d0c70ff668..8008ae5508e6 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
  * completion of the command, and return back a value from the SMU in
  * @read_arg pointer.
  *
- * Return 0 on success, -errno on error, if we weren't able to send
- * the message or if the message completed with some kind of
- * error. See __smu_cmn_reg2errno() for details of the -errno.
+ * Return 0 on success, -errno when a problem is encountered sending
+ * message or receiving reply. If there is a PCI bus recovery or
+ * the destination is a virtual GPU, the message is simply dropped and
+ * success is also returned.
+ * See __smu_cmn_reg2errno() for details of the -errno.
  *
  * If we weren't able to send the message to the SMU, we also print
  * the error to the standard log.

base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7
-- 
2.35.1

Re: [Intel-gfx] [V2 1/3] drm/debug: Expose connector's max supported bpc via debugfs

2022-04-11 Thread Modem, Bhanuprakash


On Tue-12-04-2022 08:37 am, Murthy, Arun R wrote:

+static int output_bpc_show(struct seq_file *m, void *data) {


Would it be better to have this function name as drm_output_bpc_show()


As we are using DEFINE_SHOW_ATTRIBUTE() to define file_operations, this 
function name must be _show(). Otherwise, either we need 
to define new file_operations to use the suggested name or rename the 
debugfs name to "drm_output_bpc"


Also, to align/maintain uniform with other debugfs, I think it's ok to 
use output_bpc_show().


- Bhanu



Thanks and Regards,
Arun R Murthy

[PATCH v2 5/5] drm/radeon: Use TTM builtin resource manager debugfs code

2022-04-11 Thread Zack Rusin

From: Zack Rusin 

Switch to using the TTM resource manager debugfs helpers. The
functionality is largely the same.
The TTM resource managers need to stay valid for as long as the
drm debugfs_root is valid.

Signed-off-by: Zack Rusin 
Cc: Alex Deucher 
Cc: "Christian König" 
Cc: "Pan, Xinhui" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: amd-gfx@lists.freedesktop.org
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 36 +
 1 file changed, 6 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 44594d16611f..d33fec488713 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -781,17 +781,6 @@ void radeon_ttm_set_active_vram_size(struct radeon_device 
*rdev, u64 size)
 
 #if defined(CONFIG_DEBUG_FS)
 
-static int radeon_mm_vram_dump_table_show(struct seq_file *m, void *unused)
-{
-   struct radeon_device *rdev = (struct radeon_device *)m->private;
-   struct ttm_resource_manager *man = ttm_manager_type(&rdev->mman.bdev,
-   TTM_PL_VRAM);
-   struct drm_printer p = drm_seq_file_printer(m);
-
-   ttm_resource_manager_debug(man, &p);
-   return 0;
-}
-
 static int radeon_ttm_page_pool_show(struct seq_file *m, void *data)
 {
struct radeon_device *rdev = (struct radeon_device *)m->private;
@@ -799,19 +788,6 @@ static int radeon_ttm_page_pool_show(struct seq_file *m, 
void *data)
return ttm_pool_debugfs(&rdev->mman.bdev.pool, m);
 }
 
-static int radeon_mm_gtt_dump_table_show(struct seq_file *m, void *unused)
-{
-   struct radeon_device *rdev = (struct radeon_device *)m->private;
-   struct ttm_resource_manager *man = ttm_manager_type(&rdev->mman.bdev,
-   TTM_PL_TT);
-   struct drm_printer p = drm_seq_file_printer(m);
-
-   ttm_resource_manager_debug(man, &p);
-   return 0;
-}
-
-DEFINE_SHOW_ATTRIBUTE(radeon_mm_vram_dump_table);
-DEFINE_SHOW_ATTRIBUTE(radeon_mm_gtt_dump_table);
 DEFINE_SHOW_ATTRIBUTE(radeon_ttm_page_pool);
 
 static int radeon_ttm_vram_open(struct inode *inode, struct file *filep)
@@ -930,15 +906,15 @@ static void radeon_ttm_debugfs_init(struct radeon_device 
*rdev)
 
debugfs_create_file("radeon_vram", 0444, root, rdev,
&radeon_ttm_vram_fops);
-
debugfs_create_file("radeon_gtt", 0444, root, rdev,
&radeon_ttm_gtt_fops);
-
-   debugfs_create_file("radeon_vram_mm", 0444, root, rdev,
-   &radeon_mm_vram_dump_table_fops);
-   debugfs_create_file("radeon_gtt_mm", 0444, root, rdev,
-   &radeon_mm_gtt_dump_table_fops);
debugfs_create_file("ttm_page_pool", 0444, root, rdev,
&radeon_ttm_page_pool_fops);
+   ttm_resource_manager_create_debugfs(ttm_manager_type(&rdev->mman.bdev,
+TTM_PL_VRAM),
+   root, "radeon_vram_mm");
+   ttm_resource_manager_create_debugfs(ttm_manager_type(&rdev->mman.bdev,
+TTM_PL_TT),
+   root, "radeon_gtt_mm");
 #endif
 }
-- 
2.32.0

[PATCH v2 3/5] drm/amdgpu: Use TTM builtin resource manager debugfs code

2022-04-11 Thread Zack Rusin

From: Zack Rusin 

Switch to using the TTM resource manager debugfs helpers. It's
exactly the same functionality but the debugfs code is shared with
other drivers.

The TTM resource managers need to stay valid for as long as the
drm debugfs_root is valid.

Signed-off-by: Zack Rusin 
Cc: Alex Deucher 
Cc: "Christian König" 
Cc: "Pan, Xinhui" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Felix Kuehling 
Cc: Nirmoy Das 
Cc: Thomas Zimmermann 
Cc: amd-gfx@lists.freedesktop.org
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 86 +
 1 file changed, 16 insertions(+), 70 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 57ac118fc266..37fe9fb3f8e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -2079,17 +2079,6 @@ int amdgpu_ttm_evict_resources(struct amdgpu_device 
*adev, int mem_type)
 
 #if defined(CONFIG_DEBUG_FS)
 
-static int amdgpu_mm_vram_table_show(struct seq_file *m, void *unused)
-{
-   struct amdgpu_device *adev = (struct amdgpu_device *)m->private;
-   struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev,
-   TTM_PL_VRAM);
-   struct drm_printer p = drm_seq_file_printer(m);
-
-   ttm_resource_manager_debug(man, &p);
-   return 0;
-}
-
 static int amdgpu_ttm_page_pool_show(struct seq_file *m, void *unused)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)m->private;
@@ -2097,55 +2086,6 @@ static int amdgpu_ttm_page_pool_show(struct seq_file *m, 
void *unused)
return ttm_pool_debugfs(&adev->mman.bdev.pool, m);
 }
 
-static int amdgpu_mm_tt_table_show(struct seq_file *m, void *unused)
-{
-   struct amdgpu_device *adev = (struct amdgpu_device *)m->private;
-   struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev,
-   TTM_PL_TT);
-   struct drm_printer p = drm_seq_file_printer(m);
-
-   ttm_resource_manager_debug(man, &p);
-   return 0;
-}
-
-static int amdgpu_mm_gds_table_show(struct seq_file *m, void *unused)
-{
-   struct amdgpu_device *adev = (struct amdgpu_device *)m->private;
-   struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev,
-   AMDGPU_PL_GDS);
-   struct drm_printer p = drm_seq_file_printer(m);
-
-   ttm_resource_manager_debug(man, &p);
-   return 0;
-}
-
-static int amdgpu_mm_gws_table_show(struct seq_file *m, void *unused)
-{
-   struct amdgpu_device *adev = (struct amdgpu_device *)m->private;
-   struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev,
-   AMDGPU_PL_GWS);
-   struct drm_printer p = drm_seq_file_printer(m);
-
-   ttm_resource_manager_debug(man, &p);
-   return 0;
-}
-
-static int amdgpu_mm_oa_table_show(struct seq_file *m, void *unused)
-{
-   struct amdgpu_device *adev = (struct amdgpu_device *)m->private;
-   struct ttm_resource_manager *man = ttm_manager_type(&adev->mman.bdev,
-   AMDGPU_PL_OA);
-   struct drm_printer p = drm_seq_file_printer(m);
-
-   ttm_resource_manager_debug(man, &p);
-   return 0;
-}
-
-DEFINE_SHOW_ATTRIBUTE(amdgpu_mm_vram_table);
-DEFINE_SHOW_ATTRIBUTE(amdgpu_mm_tt_table);
-DEFINE_SHOW_ATTRIBUTE(amdgpu_mm_gds_table);
-DEFINE_SHOW_ATTRIBUTE(amdgpu_mm_gws_table);
-DEFINE_SHOW_ATTRIBUTE(amdgpu_mm_oa_table);
 DEFINE_SHOW_ATTRIBUTE(amdgpu_ttm_page_pool);
 
 /*
@@ -2355,17 +2295,23 @@ void amdgpu_ttm_debugfs_init(struct amdgpu_device *adev)
 &amdgpu_ttm_vram_fops, adev->gmc.mc_vram_size);
debugfs_create_file("amdgpu_iomem", 0444, root, adev,
&amdgpu_ttm_iomem_fops);
-   debugfs_create_file("amdgpu_vram_mm", 0444, root, adev,
-   &amdgpu_mm_vram_table_fops);
-   debugfs_create_file("amdgpu_gtt_mm", 0444, root, adev,
-   &amdgpu_mm_tt_table_fops);
-   debugfs_create_file("amdgpu_gds_mm", 0444, root, adev,
-   &amdgpu_mm_gds_table_fops);
-   debugfs_create_file("amdgpu_gws_mm", 0444, root, adev,
-   &amdgpu_mm_gws_table_fops);
-   debugfs_create_file("amdgpu_oa_mm", 0444, root, adev,
-   &amdgpu_mm_oa_table_fops);
debugfs_create_file("ttm_page_pool", 0444, root, adev,
&amdgpu_ttm_page_pool_fops);
+   ttm_resource_manager_create_debugfs(ttm_manager_type(&adev->mman.bdev,
+TTM_PL_VRAM),
+   root, "amdgpu_vram_mm");
+   ttm_resource_manager_create_debugfs(ttm_manager_type(&adev->mman.bdev,
+

RE: [Intel-gfx] [V2 2/3] drm/i915/display/debug: Expose crtc current bpc via debugfs

2022-04-11 Thread Murthy, Arun R

> -Original Message-
> From: Intel-gfx  On Behalf Of
> Bhanuprakash Modem
> Sent: Monday, April 11, 2022 3:21 PM
> To: intel-...@lists.freedesktop.org; dri-de...@lists.freedesktop.org; amd-
> g...@lists.freedesktop.org; jani.nik...@linux.intel.com;
> ville.syrj...@linux.intel.com; harry.wentl...@amd.com; Sharma, Swati2
> 
> Subject: [Intel-gfx] [V2 2/3] drm/i915/display/debug: Expose crtc current bpc
> via debugfs
> 
> This new debugfs will expose the currently using bpc by crtc.
> It is very useful for verifying whether we enter the correct output color 
> depth
> from IGT.
> 
> This patch will also add the connector's max supported bpc to
> "i915_display_info" debugfs.
> 
> Example:
> cat /sys/kernel/debug/dri/0/crtc-0/i915_current_bpc
> Current: 8
> 
> Cc: Jani Nikula 
> Cc: Ville Syrjälä 
> Cc: Uma Shankar 
> Signed-off-by: Bhanuprakash Modem 

Reviewed-by: Arun R Murthy 

Thanks and Regards,
Arun R Murthy
---

RE: [Intel-gfx] [V2 1/3] drm/debug: Expose connector's max supported bpc via debugfs

2022-04-11 Thread Murthy, Arun R

> +static int output_bpc_show(struct seq_file *m, void *data) {

Would it be better to have this function name as drm_output_bpc_show()

Thanks and Regards,
Arun R Murthy

[PATCH] drm/amdkfd: shrink bitmap size in struct svm_validate_context

2022-04-11 Thread Lang Yu

A MAX_GPU_INSTANCE bits bitmap will suffice.

Signed-off-by: Lang Yu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 8b6adc142e25..459fa07a3bcc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1370,7 +1370,7 @@ struct svm_validate_context {
struct kfd_process *process;
struct svm_range *prange;
bool intr;
-   unsigned long bitmap[MAX_GPU_INSTANCE];
+   DECLARE_BITMAP(bitmap, MAX_GPU_INSTANCE);
struct ttm_validate_buffer tv[MAX_GPU_INSTANCE];
struct list_head validate_list;
struct ww_acquire_ctx ticket;
-- 
2.25.1

RE: [PATCH 1/3] drm/amdgpu: add poison consumption flag for RAS IH

2022-04-11 Thread Zhou1, Tao

[AMD Official Use Only]

Hi Stanley,

The flag is set by RAS block poison irq handler, such as vcn/jpeg poison irq 
handler. It's not configured in RAS init.

Regards,
Tao

> -Original Message-
> From: Yang, Stanley 
> Sent: Monday, April 11, 2022 10:12 PM
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; Lazar,
> Lijo ; Ziya, Mohammad zafar
> ; Zhang, Hawking
> ; Chai, Thomas 
> Subject: 回复: [PATCH 1/3] drm/amdgpu: add poison consumption flag for RAS
> IH
> 
> [AMD Official Use Only]
> 
> Hi Tao,
> 
> According to the series patches, I have one question, is the ras_ih_flag set
> according to poison mode configuration, if yes, driver will handle poison once
> get ecc_irq interrupt, but at this moment there may no app to consumes it, 
> this
> seems conflict the poison consumption definition.
> 
> Regards,
> Stanley
> > -邮件原件-
> > 发件人: Zhou1, Tao 
> > 发送时间: Monday, April 11, 2022 7:08 PM
> > 收件人: amd-gfx@lists.freedesktop.org; Lazar, Lijo ;
> > Ziya, Mohammad zafar ; Zhang, Hawking
> > ; Yang, Stanley ; Chai,
> > Thomas 
> > 抄送: Zhou1, Tao 
> > 主题: [PATCH 1/3] drm/amdgpu: add poison consumption flag for RAS IH
> >
> > So we can distinguish RAS poison consumption interrupt from UE interrupt.
> >
> > Signed-off-by: Tao Zhou 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> > index 606df8869b89..380f4c3020c7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> > @@ -314,6 +314,11 @@ enum amdgpu_ras_ret {
> >   AMDGPU_RAS_PT,
> >  };
> >
> > +enum amdgpu_ras_ih_flag {
> > + AMDGPU_RAS_IH_POISON_CONSUMPTION = 0,
> > + AMDGPU_RAS_IH_LAST,
> > +};
> > +
> >  struct ras_common_if {
> >   enum amdgpu_ras_block block;
> >   enum amdgpu_ras_error_type type; @@ -419,6 +424,8 @@ struct
> > ras_ih_data {
> >   unsigned int aligned_element_size;
> >   unsigned int rptr;
> >   unsigned int wptr;
> > + /* interrupt type flag */
> > + unsigned int flag;
> >  };
> >
> >  struct ras_manager {
> > --
> > 2.35.1
>

Re: [PATCH ] drm/amdgpu: fix discovery ip failed

2022-04-11 Thread Lazar, Lijo




Thanks Jesse.

Please also add below before you commit.

Fixes: 1ef60191b259 ("drm/amdgpu: Use flexible array member")

Reviewed-by: Lijo Lazar 


Thanks,
Lijo

On 4/12/2022 7:22 AM, Zhang, Jesse(Jie) wrote:

[AMD Official Use Only]

Thanks  Paul Menzel
Attach  the patch file .

-Original Message-
From: Paul Menzel 
Sent: Monday, 11 April 2022 11:21 pm
To: Zhang, Jesse(Jie) 
Cc: amd-gfx@lists.freedesktop.org; Zhang, Yifan ; Huang, Ray 

Subject: Re: [PATCH ] drm/amdgpu: fix discovery ip failed

[CAUTION: External Email]

Dear Jie,


Thank you for your patch.


Am 11.04.22 um 17:15 schrieb Zhang, Jesse(Jie):

You might want to add a space before the (.


[AMD Official Use Only]


Please send a patch with `git format-patch` or similar.


Fix discovery ip failed, and the log:


On what system?


 56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (45056) from ip discovery blob
[   56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (66) from ip discovery blob
[   56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (45568) from ip discovery blob
[   56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (66) from ip discovery blob
[   56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (480) from ip discovery blob
[   56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (608) from ip discovery blob
[   56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (640) from ip discovery blob
[   56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (28672) from ip discovery blob


Please describe the reason for the failure, and your fix.

A Signed-off-by line is missing.


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 766006a075ec..a778b0392e9f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct amdgpu_device 
*adev,
  res = kobject_add(&ip_hw_instance->kobj, NULL,
"%d", ip_hw_instance->num_instance);
next_ip:
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
  }
  }

@@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev)
  }

next_ip:
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
  }
  }

@@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device 
*adev, int hw_id, int n
  *revision = ip->revision;
  return 0;
  }
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
  }
  }



Kind regards,

Paul

Re: [PATCHv2] drm/amdkfd: Cleanup IO links during KFD device removal

2022-04-11 Thread Felix Kuehling


Am 2022-04-11 um 22:15 schrieb Mukul Joshi:

Currently, the IO-links to the device being removed from topology,
are not cleared. As a result, there would be dangling links left in
the KFD topology. This patch aims to fix the following:
1. Cleanup all IO links to the device being removed.
2. Ensure that node numbering in sysfs and nodes proximity domain
values are consistent after the device is removed:
a. Adding a device and removing a GPU device are made mutually
   exclusive.
b. The global proximity domain counter is no longer required to be
   an atomic counter. A normal 32-bit counter can be used instead.
3. Update generation_count to let user-mode know that topology has
changed due to device removal.

CC: Shuotao Xu 
Signed-off-by: Mukul Joshi 
Reviewed-by: Shuotao Xu 


Reviewed-by: Felix Kuehling 



---
v1->v2:
- Remove comments from inside kfd_topology_update_io_links()
   and add them as kernel-doc comments.

  drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  4 +-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 83 ---
  3 files changed, 78 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 1eaabd2cb41b..afc8a7fcdad8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct 
crat_subtype_iolink *iolink,
 * table, add corresponded reversed direction link now.
 */
if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL)) {
-   to_dev = kfd_topology_device_by_proximity_domain(id_to);
+   to_dev = kfd_topology_device_by_proximity_domain_no_lock(id_to);
if (!to_dev)
return -ENODEV;
/* same everything but the other direction */
@@ -2225,7 +2225,7 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 */
if (kdev->hive_id) {
for (nid = 0; nid < proximity_domain; ++nid) {
-   peer_dev = kfd_topology_device_by_proximity_domain(nid);
+   peer_dev = 
kfd_topology_device_by_proximity_domain_no_lock(nid);
if (!peer_dev->gpu)
continue;
if (peer_dev->gpu->hive_id != kdev->hive_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index e1b7e6afa920..8a43def1f638 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu);
  int kfd_topology_remove_device(struct kfd_dev *gpu);
  struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
uint32_t proximity_domain);
+struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock(
+   uint32_t proximity_domain);
  struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id);
  struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
  struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3bdcae239bc0..98a51847cd8c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -46,22 +46,32 @@ static struct list_head topology_device_list;
  static struct kfd_system_properties sys_props;
  
  static DECLARE_RWSEM(topology_lock);

-static atomic_t topology_crat_proximity_domain;
+static uint32_t topology_crat_proximity_domain;
  
-struct kfd_topology_device *kfd_topology_device_by_proximity_domain(

+struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock(
uint32_t proximity_domain)
  {
struct kfd_topology_device *top_dev;
struct kfd_topology_device *device = NULL;
  
-	down_read(&topology_lock);

-
list_for_each_entry(top_dev, &topology_device_list, list)
if (top_dev->proximity_domain == proximity_domain) {
device = top_dev;
break;
}
  
+	return device;

+}
+
+struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
+   uint32_t proximity_domain)
+{
+   struct kfd_topology_device *device = NULL;
+
+   down_read(&topology_lock);
+
+   device = kfd_topology_device_by_proximity_domain_no_lock(
+   proximity_domain);
up_read(&topology_lock);
  
  	return device;

@@ -1060,7 +1070,7 @@ int kfd_topology_init(void)
down_write(&topology_lock);
kfd_topology_update_device_list(&temp_topology_device_list,

[PATCHv2] drm/amdkfd: Cleanup IO links during KFD device removal

2022-04-11 Thread Mukul Joshi

Currently, the IO-links to the device being removed from topology,
are not cleared. As a result, there would be dangling links left in
the KFD topology. This patch aims to fix the following:
1. Cleanup all IO links to the device being removed.
2. Ensure that node numbering in sysfs and nodes proximity domain
   values are consistent after the device is removed:
   a. Adding a device and removing a GPU device are made mutually
  exclusive.
   b. The global proximity domain counter is no longer required to be
  an atomic counter. A normal 32-bit counter can be used instead.
3. Update generation_count to let user-mode know that topology has
   changed due to device removal.

CC: Shuotao Xu 
Signed-off-by: Mukul Joshi 
Reviewed-by: Shuotao Xu 
---
v1->v2:
- Remove comments from inside kfd_topology_update_io_links()
  and add them as kernel-doc comments.

 drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 83 ---
 3 files changed, 78 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 1eaabd2cb41b..afc8a7fcdad8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct 
crat_subtype_iolink *iolink,
 * table, add corresponded reversed direction link now.
 */
if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL)) {
-   to_dev = kfd_topology_device_by_proximity_domain(id_to);
+   to_dev = kfd_topology_device_by_proximity_domain_no_lock(id_to);
if (!to_dev)
return -ENODEV;
/* same everything but the other direction */
@@ -2225,7 +2225,7 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 */
if (kdev->hive_id) {
for (nid = 0; nid < proximity_domain; ++nid) {
-   peer_dev = kfd_topology_device_by_proximity_domain(nid);
+   peer_dev = 
kfd_topology_device_by_proximity_domain_no_lock(nid);
if (!peer_dev->gpu)
continue;
if (peer_dev->gpu->hive_id != kdev->hive_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index e1b7e6afa920..8a43def1f638 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu);
 int kfd_topology_remove_device(struct kfd_dev *gpu);
 struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
uint32_t proximity_domain);
+struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock(
+   uint32_t proximity_domain);
 struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id);
 struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
 struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3bdcae239bc0..98a51847cd8c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -46,22 +46,32 @@ static struct list_head topology_device_list;
 static struct kfd_system_properties sys_props;
 
 static DECLARE_RWSEM(topology_lock);
-static atomic_t topology_crat_proximity_domain;
+static uint32_t topology_crat_proximity_domain;
 
-struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
+struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock(
uint32_t proximity_domain)
 {
struct kfd_topology_device *top_dev;
struct kfd_topology_device *device = NULL;
 
-   down_read(&topology_lock);
-
list_for_each_entry(top_dev, &topology_device_list, list)
if (top_dev->proximity_domain == proximity_domain) {
device = top_dev;
break;
}
 
+   return device;
+}
+
+struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
+   uint32_t proximity_domain)
+{
+   struct kfd_topology_device *device = NULL;
+
+   down_read(&topology_lock);
+
+   device = kfd_topology_device_by_proximity_domain_no_lock(
+   proximity_domain);
up_read(&topology_lock);
 
return device;
@@ -1060,7 +1070,7 @@ int kfd_topology_init(void)
down_write(&topology_lock);
kfd_topology_update_device_list(&temp_topology_device_list,
&topology_device_list);
-   atomic_set(&topology_crat_proximi

RE: [PATCH ] drm/amdgpu: fix discovery ip failed

2022-04-11 Thread Zhang, Jesse(Jie)

[AMD Official Use Only]

Thanks  Paul Menzel
Attach  the patch file .

-Original Message-
From: Paul Menzel  
Sent: Monday, 11 April 2022 11:21 pm
To: Zhang, Jesse(Jie) 
Cc: amd-gfx@lists.freedesktop.org; Zhang, Yifan ; Huang, 
Ray 
Subject: Re: [PATCH ] drm/amdgpu: fix discovery ip failed

[CAUTION: External Email]

Dear Jie,


Thank you for your patch.


Am 11.04.22 um 17:15 schrieb Zhang, Jesse(Jie):

You might want to add a space before the (.

> [AMD Official Use Only]

Please send a patch with `git format-patch` or similar.

> Fix discovery ip failed, and the log:

On what system?

> 56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> number_instance (64) from ip discovery blob
> [   56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (45056) from ip discovery blob
> [   56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> number_instance (66) from ip discovery blob
> [   56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (45568) from ip discovery blob
> [   56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> number_instance (66) from ip discovery blob
> [   56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (480) from ip discovery blob
> [   56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (608) from ip discovery blob
> [   56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (640) from ip discovery blob
> [   56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> number_instance (64) from ip discovery blob
> [   56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> number_instance (64) from ip discovery blob
> [   56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (28672) from ip discovery blob

Please describe the reason for the failure, and your fix.

A Signed-off-by line is missing.

> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index 766006a075ec..a778b0392e9f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct 
> amdgpu_device *adev,
>  res = kobject_add(&ip_hw_instance->kobj, NULL,
>"%d", 
> ip_hw_instance->num_instance);
> next_ip:
> -   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address 
> - 1);
> +   ip_offset += struct_size(ip, base_address, 
> ip->num_base_address);
>  }
>  }
>
> @@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
> *adev)
>  }
>
> next_ip:
> -   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address 
> - 1);
> +   ip_offset += struct_size(ip, base_address, 
> ip->num_base_address);
>  }
>  }
>
> @@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct 
> amdgpu_device *adev, int hw_id, int n
>  *revision = ip->revision;
>  return 0;
>  }
> -   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address 
> - 1);
> +   ip_offset += struct_size(ip, base_address, 
> ip->num_base_address);
>  }
>  }
>

Kind regards,

Paul


0001-fix-ip-discovery-failed-SWDEV-332054.patch
Description: 0001-fix-ip-discovery-failed-SWDEV-332054.patch

Re: [PATCH] drm/amdkfd: Cleanup IO links during KFD device removal

2022-04-11 Thread Felix Kuehling


Am 2022-04-11 um 21:14 schrieb Joshi, Mukul:

[AMD Official Use Only]




-Original Message-
From: Kuehling, Felix 
Sent: Monday, April 11, 2022 8:16 PM
To: Joshi, Mukul ; amd-gfx@lists.freedesktop.org
Cc: Shuotao Xu 
Subject: Re: [PATCH] drm/amdkfd: Cleanup IO links during KFD device
removal

Am 2022-04-07 um 12:15 schrieb Mukul Joshi:

Currently, the IO-links to the device being removed from topology, are
not cleared. As a result, there would be dangling links left in the
KFD topology. This patch aims to fix the following:
1. Cleanup all IO links to the device being removed.
2. Ensure that node numbering in sysfs and nodes proximity domain
 values are consistent after the device is removed:
 a. Adding a device and removing a GPU device are made mutually
exclusive.
 b. The global proximity domain counter is no longer required to be
an atomic counter. A normal 32-bit counter can be used instead.
3. Update generation_count to let user-mode know that topology has
 changed due to device removal.

CC: Shuotao Xu 
Signed-off-by: Mukul Joshi 

Looks good to me. I have two nit-picks inline.



---
   drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  4 +-
   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +
   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 79

---

   3 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 1eaabd2cb41b..afc8a7fcdad8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct

crat_subtype_iolink *iolink,

 * table, add corresponded reversed direction link now.
 */
if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL))

{

-   to_dev =

kfd_topology_device_by_proximity_domain(id_to);

+   to_dev =

kfd_topology_device_by_proximity_domain_no_lock(id_to);

if (!to_dev)
return -ENODEV;
/* same everything but the other direction */ @@ -2225,7

+2225,7

@@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 */
if (kdev->hive_id) {
for (nid = 0; nid < proximity_domain; ++nid) {
-   peer_dev =

kfd_topology_device_by_proximity_domain(nid);

+   peer_dev =

kfd_topology_device_by_proximity_domain_no_lock(nid);

if (!peer_dev->gpu)
continue;
if (peer_dev->gpu->hive_id != kdev->hive_id) diff --

git

a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index e1b7e6afa920..8a43def1f638 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev

*gpu);

   int kfd_topology_remove_device(struct kfd_dev *gpu);
   struct kfd_topology_device

*kfd_topology_device_by_proximity_domain(

uint32_t proximity_domain);
+struct kfd_topology_device

*kfd_topology_device_by_proximity_domain_no_lock(

+   uint32_t proximity_domain);
   struct kfd_topology_device *kfd_topology_device_by_id(uint32_t

gpu_id);

   struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
   struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3bdcae239bc0..874a273b81f7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -46,27 +46,38 @@ static struct list_head topology_device_list;
   static struct kfd_system_properties sys_props;

   static DECLARE_RWSEM(topology_lock); -static atomic_t
topology_crat_proximity_domain;
+static uint32_t topology_crat_proximity_domain;

-struct kfd_topology_device

*kfd_topology_device_by_proximity_domain(

+struct kfd_topology_device
+*kfd_topology_device_by_proximity_domain_no_lock(
uint32_t proximity_domain)

I remember we discussed this and I suggested splitting a no_lock version out
of this function. But now I don't see it being used anywhere. Was that lost
somewhere in refactoring or porting to the upstream branch?
Maybe the no_lock version isn't needed any more.


Its used in the changes in kfd_crat.c (in kfd_create_vcrat_image_gpu() and
kfd_parse_subtype_iolink ()) and  below in 
kfd_topology_device_by_proximity_domain().


You're right, I missed the changes in kfd_crat.c. And they are needed 
because the whole CRAT table parsing is now under the topology lock. 
Thanks for the reminder.


Regards,
  Felix





   {
struct kfd_topology_device *top_dev;
struct kfd_topology_device *device = NULL;

-   down_read(&topology_lock);
-
l

RE: [PATCH] drm/amdkfd: Cleanup IO links during KFD device removal

2022-04-11 Thread Joshi, Mukul

[AMD Official Use Only]



> -Original Message-
> From: Kuehling, Felix 
> Sent: Monday, April 11, 2022 8:16 PM
> To: Joshi, Mukul ; amd-gfx@lists.freedesktop.org
> Cc: Shuotao Xu 
> Subject: Re: [PATCH] drm/amdkfd: Cleanup IO links during KFD device
> removal
> 
> Am 2022-04-07 um 12:15 schrieb Mukul Joshi:
> > Currently, the IO-links to the device being removed from topology, are
> > not cleared. As a result, there would be dangling links left in the
> > KFD topology. This patch aims to fix the following:
> > 1. Cleanup all IO links to the device being removed.
> > 2. Ensure that node numbering in sysfs and nodes proximity domain
> > values are consistent after the device is removed:
> > a. Adding a device and removing a GPU device are made mutually
> >exclusive.
> > b. The global proximity domain counter is no longer required to be
> >an atomic counter. A normal 32-bit counter can be used instead.
> > 3. Update generation_count to let user-mode know that topology has
> > changed due to device removal.
> >
> > CC: Shuotao Xu 
> > Signed-off-by: Mukul Joshi 
> 
> Looks good to me. I have two nit-picks inline.
> 
> 
> > ---
> >   drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  4 +-
> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +
> >   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 79
> ---
> >   3 files changed, 74 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > index 1eaabd2cb41b..afc8a7fcdad8 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > @@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct
> crat_subtype_iolink *iolink,
> >  * table, add corresponded reversed direction link now.
> >  */
> > if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL))
> {
> > -   to_dev =
> kfd_topology_device_by_proximity_domain(id_to);
> > +   to_dev =
> kfd_topology_device_by_proximity_domain_no_lock(id_to);
> > if (!to_dev)
> > return -ENODEV;
> > /* same everything but the other direction */ @@ -2225,7
> +2225,7
> > @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
> >  */
> > if (kdev->hive_id) {
> > for (nid = 0; nid < proximity_domain; ++nid) {
> > -   peer_dev =
> kfd_topology_device_by_proximity_domain(nid);
> > +   peer_dev =
> kfd_topology_device_by_proximity_domain_no_lock(nid);
> > if (!peer_dev->gpu)
> > continue;
> > if (peer_dev->gpu->hive_id != kdev->hive_id) diff --
> git
> > a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > index e1b7e6afa920..8a43def1f638 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > @@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev
> *gpu);
> >   int kfd_topology_remove_device(struct kfd_dev *gpu);
> >   struct kfd_topology_device
> *kfd_topology_device_by_proximity_domain(
> > uint32_t proximity_domain);
> > +struct kfd_topology_device
> *kfd_topology_device_by_proximity_domain_no_lock(
> > +   uint32_t proximity_domain);
> >   struct kfd_topology_device *kfd_topology_device_by_id(uint32_t
> gpu_id);
> >   struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
> >   struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > index 3bdcae239bc0..874a273b81f7 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> > @@ -46,27 +46,38 @@ static struct list_head topology_device_list;
> >   static struct kfd_system_properties sys_props;
> >
> >   static DECLARE_RWSEM(topology_lock); -static atomic_t
> > topology_crat_proximity_domain;
> > +static uint32_t topology_crat_proximity_domain;
> >
> > -struct kfd_topology_device
> *kfd_topology_device_by_proximity_domain(
> > +struct kfd_topology_device
> > +*kfd_topology_device_by_proximity_domain_no_lock(
> > uint32_t proximity_domain)
> 
> I remember we discussed this and I suggested splitting a no_lock version out
> of this function. But now I don't see it being used anywhere. Was that lost
> somewhere in refactoring or porting to the upstream branch?
> Maybe the no_lock version isn't needed any more.
> 
Its used in the changes in kfd_crat.c (in kfd_create_vcrat_image_gpu() and
kfd_parse_subtype_iolink ()) and  below in 
kfd_topology_device_by_proximity_domain().

> 
> >   {
> > struct kfd_topology_device *top_dev;
> > struct kfd_topology_device *device = NULL;
> >
> > -   down_read(&topology_lock);
> > -
>

[PATCH AUTOSEL 4.9 1/7] drm/amdkfd: Check for potential null return of kmalloc_array()

2022-04-11 Thread Sasha Levin

From: QintaoShen 

[ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ]

As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead 
to null-pointer dereference.
Therefore, it is better to check the return value of kmalloc_array() to avoid 
this confusion.

Signed-off-by: QintaoShen 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 6a3470f84998..732713ff3190 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -607,6 +607,8 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
event_waiters = kmalloc_array(num_events,
sizeof(struct kfd_event_waiter),
GFP_KERNEL);
+   if (!event_waiters)
+   return NULL;
 
for (i = 0; (event_waiters) && (i < num_events) ; i++) {
INIT_LIST_HEAD(&event_waiters[i].waiters);
-- 
2.35.1

[PATCH AUTOSEL 4.14 02/11] drm/amdkfd: Check for potential null return of kmalloc_array()

2022-04-11 Thread Sasha Levin

From: QintaoShen 

[ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ]

As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead 
to null-pointer dereference.
Therefore, it is better to check the return value of kmalloc_array() to avoid 
this confusion.

Signed-off-by: QintaoShen 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 944abfad39c1..1d8dd81dfc70 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -607,6 +607,8 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
event_waiters = kmalloc_array(num_events,
sizeof(struct kfd_event_waiter),
GFP_KERNEL);
+   if (!event_waiters)
+   return NULL;
 
for (i = 0; (event_waiters) && (i < num_events) ; i++) {
INIT_LIST_HEAD(&event_waiters[i].waiters);
-- 
2.35.1

[PATCH AUTOSEL 4.14 01/11] drm/amd: Add USBC connector ID

2022-04-11 Thread Sasha Levin

From: Aurabindo Pillai 

[ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ]

[Why&How] Add a dedicated AMDGPU specific ID for use with
newer ASICs that support USB-C output

Signed-off-by: Aurabindo Pillai 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h 
b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
index 06192698bd96..c90567de8bf7 100644
--- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h
+++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
@@ -119,6 +119,7 @@
 #define CONNECTOR_OBJECT_ID_eDP   0x14
 #define CONNECTOR_OBJECT_ID_MXM   0x15
 #define CONNECTOR_OBJECT_ID_LVDS_eDP  0x16
+#define CONNECTOR_OBJECT_ID_USBC  0x17
 
 /* deleted */
 
-- 
2.35.1

[PATCH AUTOSEL 4.19 10/12] drm/amd/display: Fix allocate_mst_payload assert on resume

2022-04-11 Thread Sasha Levin

From: Roman Li 

[ Upstream commit f4346fb3edf7720db3f7f5e1cab1f667cd024280 ]

[Why]
On resume we do link detection for all non-MST connectors.
MST is handled separately. However the condition for telling
if connector is on mst branch is not enough for mst hub case.
Link detection for mst branch link leads to mst topology reset.
That causes assert in dc_link_allocate_mst_payload()

[How]
Use link type as indicator for mst link.

Reviewed-by: Wayne Lin 
Acked-by: Alex Hung 
Signed-off-by: Roman Li 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ed02bb6b2cd0..1d1183c47beb 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -777,7 +777,8 @@ static int dm_resume(void *handle)
 * this is the case when traversing through already created
 * MST connectors, should be skipped
 */
-   if (aconnector->mst_port)
+   if (aconnector->dc_link &&
+   aconnector->dc_link->type == dc_connection_mst_branch)
continue;
 
mutex_lock(&aconnector->hpd_lock);
-- 
2.35.1

[PATCH AUTOSEL 4.19 02/12] drm/amdkfd: Check for potential null return of kmalloc_array()

2022-04-11 Thread Sasha Levin

From: QintaoShen 

[ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ]

As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead 
to null-pointer dereference.
Therefore, it is better to check the return value of kmalloc_array() to avoid 
this confusion.

Signed-off-by: QintaoShen 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index e9f0e0a1b41c..892077377339 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -532,6 +532,8 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
event_waiters = kmalloc_array(num_events,
sizeof(struct kfd_event_waiter),
GFP_KERNEL);
+   if (!event_waiters)
+   return NULL;
 
for (i = 0; (event_waiters) && (i < num_events) ; i++) {
init_wait(&event_waiters[i].wait);
-- 
2.35.1

[PATCH AUTOSEL 4.19 01/12] drm/amd: Add USBC connector ID

2022-04-11 Thread Sasha Levin

From: Aurabindo Pillai 

[ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ]

[Why&How] Add a dedicated AMDGPU specific ID for use with
newer ASICs that support USB-C output

Signed-off-by: Aurabindo Pillai 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h 
b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
index 5b393622f592..a0f0a17e224f 100644
--- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h
+++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
@@ -119,6 +119,7 @@
 #define CONNECTOR_OBJECT_ID_eDP   0x14
 #define CONNECTOR_OBJECT_ID_MXM   0x15
 #define CONNECTOR_OBJECT_ID_LVDS_eDP  0x16
+#define CONNECTOR_OBJECT_ID_USBC  0x17
 
 /* deleted */
 
-- 
2.35.1

[PATCH AUTOSEL 5.4 16/21] drm/amd/display: Fix allocate_mst_payload assert on resume

2022-04-11 Thread Sasha Levin

From: Roman Li 

[ Upstream commit f4346fb3edf7720db3f7f5e1cab1f667cd024280 ]

[Why]
On resume we do link detection for all non-MST connectors.
MST is handled separately. However the condition for telling
if connector is on mst branch is not enough for mst hub case.
Link detection for mst branch link leads to mst topology reset.
That causes assert in dc_link_allocate_mst_payload()

[How]
Use link type as indicator for mst link.

Reviewed-by: Wayne Lin 
Acked-by: Alex Hung 
Signed-off-by: Roman Li 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8e4d863c7570..a83d33a92650 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1210,7 +1210,8 @@ static int dm_resume(void *handle)
 * this is the case when traversing through already created
 * MST connectors, should be skipped
 */
-   if (aconnector->mst_port)
+   if (aconnector->dc_link &&
+   aconnector->dc_link->type == dc_connection_mst_branch)
continue;
 
mutex_lock(&aconnector->hpd_lock);
-- 
2.35.1

[PATCH AUTOSEL 5.4 05/21] drm/amdkfd: Check for potential null return of kmalloc_array()

2022-04-11 Thread Sasha Levin

From: QintaoShen 

[ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ]

As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead 
to null-pointer dereference.
Therefore, it is better to check the return value of kmalloc_array() to avoid 
this confusion.

Signed-off-by: QintaoShen 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index d674d4b3340f..adbb2fec2e0f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -532,6 +532,8 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
event_waiters = kmalloc_array(num_events,
sizeof(struct kfd_event_waiter),
GFP_KERNEL);
+   if (!event_waiters)
+   return NULL;
 
for (i = 0; (event_waiters) && (i < num_events) ; i++) {
init_wait(&event_waiters[i].wait);
-- 
2.35.1

[PATCH AUTOSEL 5.4 04/21] drm/amdkfd: Fix Incorrect VMIDs passed to HWS

2022-04-11 Thread Sasha Levin

From: Tushar Patel 

[ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ]

Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix
this by passing correct number of VMIDs to HWS

v2: squash in warning fix (Alex)

Signed-off-by: Tushar Patel 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 +++
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index e8e172010416..ffd754713522 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -633,7 +633,7 @@ MODULE_PARM_DESC(sched_policy,
  * Maximum number of processes that HWS can schedule concurrently. The maximum 
is the
  * number of VMIDs assigned to the HWS, which is also the default.
  */
-int hws_max_conc_proc = 8;
+int hws_max_conc_proc = -1;
 module_param(hws_max_conc_proc, int, 0444);
 MODULE_PARM_DESC(hws_max_conc_proc,
"Max # processes HWS can execute concurrently when sched_policy=0 (0 = 
no concurrency, #VMIDs for KFD = Maximum(default))");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index ad9483b9eea3..60ee1a832112 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -609,15 +609,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
- kfd->vm_info.first_vmid_kfd + 1;
 
/* Verify module parameters regarding mapped process number*/
-   if ((hws_max_conc_proc < 0)
-   || (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) {
-   dev_err(kfd_device,
-   "hws_max_conc_proc %d must be between 0 and %d, use %d 
instead\n",
-   hws_max_conc_proc, kfd->vm_info.vmid_num_kfd,
-   kfd->vm_info.vmid_num_kfd);
+   if (hws_max_conc_proc >= 0)
+   kfd->max_proc_per_quantum = min((u32)hws_max_conc_proc, 
kfd->vm_info.vmid_num_kfd);
+   else
kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd;
-   } else
-   kfd->max_proc_per_quantum = hws_max_conc_proc;
 
/* Allocate global GWS that is shared by all KFD processes */
if (hws_gws_support && amdgpu_amdkfd_alloc_gws(kfd->kgd,
-- 
2.35.1

[PATCH AUTOSEL 5.4 03/21] drm/amd/display: Update VTEM Infopacket definition

2022-04-11 Thread Sasha Levin

From: "Leo (Hanghong) Ma" 

[ Upstream commit c9fbf6435162ed5fb7201d1d4adf6585c6a8c327 ]

[Why & How]
The latest HDMI SPEC has updated the VTEM packet structure,
so change the VTEM Infopacket defined in the driver side to align
with the SPEC.

Reviewed-by: Chris Park 
Acked-by: Alex Hung 
Signed-off-by: Leo (Hanghong) Ma 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../gpu/drm/amd/display/modules/info_packet/info_packet.c| 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c 
b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
index d885d642ed7f..537736713598 100644
--- a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
+++ b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
@@ -85,7 +85,8 @@
 //PB7 = MD0
 #define MASK_VTEM_MD0__VRR_EN 0x01
 #define MASK_VTEM_MD0__M_CONST0x02
-#define MASK_VTEM_MD0__RESERVED2  0x0C
+#define MASK_VTEM_MD0__QMS_EN 0x04
+#define MASK_VTEM_MD0__RESERVED2  0x08
 #define MASK_VTEM_MD0__FVA_FACTOR_M1  0xF0
 
 //MD1
@@ -94,7 +95,7 @@
 //MD2
 #define MASK_VTEM_MD2__BASE_REFRESH_RATE_98  0x03
 #define MASK_VTEM_MD2__RB0x04
-#define MASK_VTEM_MD2__RESERVED3 0xF8
+#define MASK_VTEM_MD2__NEXT_TFR  0xF8
 
 //MD3
 #define MASK_VTEM_MD3__BASE_REFRESH_RATE_07  0xFF
-- 
2.35.1

[PATCH AUTOSEL 5.4 02/21] drm/amd/display: fix audio format not updated after edid updated

2022-04-11 Thread Sasha Levin

From: Charlene Liu 

[ Upstream commit 5e8a71cf13bc9184fee915b2220be71b4c6cac74 ]

[why]
for the case edid change only changed audio format.
driver still need to update stream.

Reviewed-by: Alvin Lee 
Reviewed-by: Aric Cyr 
Acked-by: Alex Hung 
Signed-off-by: Charlene Liu 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 95a5310e9e66..de246e183d6b 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1546,8 +1546,8 @@ bool dc_is_stream_unchanged(
if (old_stream->ignore_msa_timing_param != 
stream->ignore_msa_timing_param)
return false;
 
-   // Only Have Audio left to check whether it is same or not. This is a 
corner case for Tiled sinks
-   if (old_stream->audio_info.mode_count != stream->audio_info.mode_count)
+   /*compare audio info*/
+   if (memcmp(&old_stream->audio_info, &stream->audio_info, 
sizeof(stream->audio_info)) != 0)
return false;
 
return true;
-- 
2.35.1

[PATCH AUTOSEL 5.4 01/21] drm/amd: Add USBC connector ID

2022-04-11 Thread Sasha Levin

From: Aurabindo Pillai 

[ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ]

[Why&How] Add a dedicated AMDGPU specific ID for use with
newer ASICs that support USB-C output

Signed-off-by: Aurabindo Pillai 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h 
b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
index 5b393622f592..a0f0a17e224f 100644
--- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h
+++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
@@ -119,6 +119,7 @@
 #define CONNECTOR_OBJECT_ID_eDP   0x14
 #define CONNECTOR_OBJECT_ID_MXM   0x15
 #define CONNECTOR_OBJECT_ID_LVDS_eDP  0x16
+#define CONNECTOR_OBJECT_ID_USBC  0x17
 
 /* deleted */
 
-- 
2.35.1

[PATCH AUTOSEL 5.10 24/30] drm/amd/display: Fix allocate_mst_payload assert on resume

2022-04-11 Thread Sasha Levin

From: Roman Li 

[ Upstream commit f4346fb3edf7720db3f7f5e1cab1f667cd024280 ]

[Why]
On resume we do link detection for all non-MST connectors.
MST is handled separately. However the condition for telling
if connector is on mst branch is not enough for mst hub case.
Link detection for mst branch link leads to mst topology reset.
That causes assert in dc_link_allocate_mst_payload()

[How]
Use link type as indicator for mst link.

Reviewed-by: Wayne Lin 
Acked-by: Alex Hung 
Signed-off-by: Roman Li 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index e828f9414ba2..7bb151283f44 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2022,7 +2022,8 @@ static int dm_resume(void *handle)
 * this is the case when traversing through already created
 * MST connectors, should be skipped
 */
-   if (aconnector->mst_port)
+   if (aconnector->dc_link &&
+   aconnector->dc_link->type == dc_connection_mst_branch)
continue;
 
mutex_lock(&aconnector->hpd_lock);
-- 
2.35.1

[PATCH AUTOSEL 5.10 23/30] drm/amd/display: Revert FEC check in validation

2022-04-11 Thread Sasha Levin

From: Martin Leung 

[ Upstream commit b2075fce104b88b789c15ef1ed2b91dc94198e26 ]

why and how:
causes failure on install on certain machines

Reviewed-by: George Shen 
Acked-by: Alex Hung 
Signed-off-by: Martin Leung 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index ac5323596c65..93f5229c303e 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1173,10 +1173,6 @@ bool dc_validate_seamless_boot_timing(const struct dc 
*dc,
if (!link->link_enc->funcs->is_dig_enabled(link->link_enc))
return false;
 
-   /* Check for FEC status*/
-   if (link->link_enc->funcs->fec_is_active(link->link_enc))
-   return false;
-
enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc);
 
if (enc_inst == ENGINE_ID_UNKNOWN)
-- 
2.35.1

[PATCH AUTOSEL 5.10 09/30] drm/amdkfd: Check for potential null return of kmalloc_array()

2022-04-11 Thread Sasha Levin

From: QintaoShen 

[ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ]

As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead 
to null-pointer dereference.
Therefore, it is better to check the return value of kmalloc_array() to avoid 
this confusion.

Signed-off-by: QintaoShen 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index ba2c2ce0c55a..159be13ef20b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -531,6 +531,8 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
event_waiters = kmalloc_array(num_events,
sizeof(struct kfd_event_waiter),
GFP_KERNEL);
+   if (!event_waiters)
+   return NULL;
 
for (i = 0; (event_waiters) && (i < num_events) ; i++) {
init_wait(&event_waiters[i].wait);
-- 
2.35.1

[PATCH AUTOSEL 5.10 08/30] drm/amdgpu/vcn: improve vcn dpg stop procedure

2022-04-11 Thread Sasha Levin

From: Tianci Yin 

[ Upstream commit 6ea239adc2a712eb318f04f5c29b018ba65ea38a ]

Prior to disabling dpg, VCN need unpausing dpg mode, or VCN will hang in
S3 resuming.

Reviewed-by: James Zhu 
Signed-off-by: Tianci Yin 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index 2099f6ebd833..bdb8e596bda6 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
@@ -1429,8 +1429,11 @@ static int vcn_v3_0_start_sriov(struct amdgpu_device 
*adev)
 
 static int vcn_v3_0_stop_dpg_mode(struct amdgpu_device *adev, int inst_idx)
 {
+   struct dpg_pause_state state = {.fw_based = VCN_DPG_STATE__UNPAUSE};
uint32_t tmp;
 
+   vcn_v3_0_pause_dpg_mode(adev, 0, &state);
+
/* Wait for power status to be 1 */
SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
-- 
2.35.1

[PATCH AUTOSEL 5.10 07/30] drm/amdkfd: Fix Incorrect VMIDs passed to HWS

2022-04-11 Thread Sasha Levin

From: Tushar Patel 

[ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ]

Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix
this by passing correct number of VMIDs to HWS

v2: squash in warning fix (Alex)

Signed-off-by: Tushar Patel 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 +++
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ed13a2f76884..30659c1776e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -632,7 +632,7 @@ MODULE_PARM_DESC(sched_policy,
  * Maximum number of processes that HWS can schedule concurrently. The maximum 
is the
  * number of VMIDs assigned to the HWS, which is also the default.
  */
-int hws_max_conc_proc = 8;
+int hws_max_conc_proc = -1;
 module_param(hws_max_conc_proc, int, 0444);
 MODULE_PARM_DESC(hws_max_conc_proc,
"Max # processes HWS can execute concurrently when sched_policy=0 (0 = 
no concurrency, #VMIDs for KFD = Maximum(default))");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 84313135c2ea..148e43dee657 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -664,15 +664,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
- kfd->vm_info.first_vmid_kfd + 1;
 
/* Verify module parameters regarding mapped process number*/
-   if ((hws_max_conc_proc < 0)
-   || (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) {
-   dev_err(kfd_device,
-   "hws_max_conc_proc %d must be between 0 and %d, use %d 
instead\n",
-   hws_max_conc_proc, kfd->vm_info.vmid_num_kfd,
-   kfd->vm_info.vmid_num_kfd);
+   if (hws_max_conc_proc >= 0)
+   kfd->max_proc_per_quantum = min((u32)hws_max_conc_proc, 
kfd->vm_info.vmid_num_kfd);
+   else
kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd;
-   } else
-   kfd->max_proc_per_quantum = hws_max_conc_proc;
 
/* calculate max size of mqds needed for queues */
size = max_num_of_queues_per_device *
-- 
2.35.1

[PATCH AUTOSEL 5.10 06/30] drm/amd/display: Update VTEM Infopacket definition

2022-04-11 Thread Sasha Levin

From: "Leo (Hanghong) Ma" 

[ Upstream commit c9fbf6435162ed5fb7201d1d4adf6585c6a8c327 ]

[Why & How]
The latest HDMI SPEC has updated the VTEM packet structure,
so change the VTEM Infopacket defined in the driver side to align
with the SPEC.

Reviewed-by: Chris Park 
Acked-by: Alex Hung 
Signed-off-by: Leo (Hanghong) Ma 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../gpu/drm/amd/display/modules/info_packet/info_packet.c| 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c 
b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
index 0fdf7a3e96de..96e18050a617 100644
--- a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
+++ b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
@@ -100,7 +100,8 @@ enum vsc_packet_revision {
 //PB7 = MD0
 #define MASK_VTEM_MD0__VRR_EN 0x01
 #define MASK_VTEM_MD0__M_CONST0x02
-#define MASK_VTEM_MD0__RESERVED2  0x0C
+#define MASK_VTEM_MD0__QMS_EN 0x04
+#define MASK_VTEM_MD0__RESERVED2  0x08
 #define MASK_VTEM_MD0__FVA_FACTOR_M1  0xF0
 
 //MD1
@@ -109,7 +110,7 @@ enum vsc_packet_revision {
 //MD2
 #define MASK_VTEM_MD2__BASE_REFRESH_RATE_98  0x03
 #define MASK_VTEM_MD2__RB0x04
-#define MASK_VTEM_MD2__RESERVED3 0xF8
+#define MASK_VTEM_MD2__NEXT_TFR  0xF8
 
 //MD3
 #define MASK_VTEM_MD3__BASE_REFRESH_RATE_07  0xFF
-- 
2.35.1

[PATCH AUTOSEL 5.10 05/30] drm/amd/display: FEC check in timing validation

2022-04-11 Thread Sasha Levin

From: Chiawen Huang 

[ Upstream commit 7d56a154e22ffb3613fdebf83ec34d5225a22993 ]

[Why]
disable/enable leads FEC mismatch between hw/sw FEC state.

[How]
check FEC status to fastboot on/off.

Reviewed-by: Anthony Koo 
Acked-by: Alex Hung 
Signed-off-by: Chiawen Huang 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 93f5229c303e..ac5323596c65 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1173,6 +1173,10 @@ bool dc_validate_seamless_boot_timing(const struct dc 
*dc,
if (!link->link_enc->funcs->is_dig_enabled(link->link_enc))
return false;
 
+   /* Check for FEC status*/
+   if (link->link_enc->funcs->fec_is_active(link->link_enc))
+   return false;
+
enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc);
 
if (enc_inst == ENGINE_ID_UNKNOWN)
-- 
2.35.1

[PATCH AUTOSEL 5.10 04/30] drm/amd/display: fix audio format not updated after edid updated

2022-04-11 Thread Sasha Levin

From: Charlene Liu 

[ Upstream commit 5e8a71cf13bc9184fee915b2220be71b4c6cac74 ]

[why]
for the case edid change only changed audio format.
driver still need to update stream.

Reviewed-by: Alvin Lee 
Reviewed-by: Aric Cyr 
Acked-by: Alex Hung 
Signed-off-by: Charlene Liu 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 5f4cdb05c4db..a21f1141fbb0 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1698,8 +1698,8 @@ bool dc_is_stream_unchanged(
if (old_stream->ignore_msa_timing_param != 
stream->ignore_msa_timing_param)
return false;
 
-   // Only Have Audio left to check whether it is same or not. This is a 
corner case for Tiled sinks
-   if (old_stream->audio_info.mode_count != stream->audio_info.mode_count)
+   /*compare audio info*/
+   if (memcmp(&old_stream->audio_info, &stream->audio_info, 
sizeof(stream->audio_info)) != 0)
return false;
 
return true;
-- 
2.35.1

[PATCH AUTOSEL 5.10 01/30] drm/amd: Add USBC connector ID

2022-04-11 Thread Sasha Levin

From: Aurabindo Pillai 

[ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ]

[Why&How] Add a dedicated AMDGPU specific ID for use with
newer ASICs that support USB-C output

Signed-off-by: Aurabindo Pillai 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h 
b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
index 5b393622f592..a0f0a17e224f 100644
--- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h
+++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
@@ -119,6 +119,7 @@
 #define CONNECTOR_OBJECT_ID_eDP   0x14
 #define CONNECTOR_OBJECT_ID_MXM   0x15
 #define CONNECTOR_OBJECT_ID_LVDS_eDP  0x16
+#define CONNECTOR_OBJECT_ID_USBC  0x17
 
 /* deleted */
 
-- 
2.35.1

[PATCH AUTOSEL 5.15 31/41] drm/amd/display: Fix allocate_mst_payload assert on resume

2022-04-11 Thread Sasha Levin

From: Roman Li 

[ Upstream commit f4346fb3edf7720db3f7f5e1cab1f667cd024280 ]

[Why]
On resume we do link detection for all non-MST connectors.
MST is handled separately. However the condition for telling
if connector is on mst branch is not enough for mst hub case.
Link detection for mst branch link leads to mst topology reset.
That causes assert in dc_link_allocate_mst_payload()

[How]
Use link type as indicator for mst link.

Reviewed-by: Wayne Lin 
Acked-by: Alex Hung 
Signed-off-by: Roman Li 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index b9859e52ad92..091524bdc673 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2296,7 +2296,8 @@ static int dm_resume(void *handle)
 * this is the case when traversing through already created
 * MST connectors, should be skipped
 */
-   if (aconnector->mst_port)
+   if (aconnector->dc_link &&
+   aconnector->dc_link->type == dc_connection_mst_branch)
continue;
 
mutex_lock(&aconnector->hpd_lock);
-- 
2.35.1

[PATCH AUTOSEL 5.15 30/41] drm/amd/display: Revert FEC check in validation

2022-04-11 Thread Sasha Levin

From: Martin Leung 

[ Upstream commit b2075fce104b88b789c15ef1ed2b91dc94198e26 ]

why and how:
causes failure on install on certain machines

Reviewed-by: George Shen 
Acked-by: Alex Hung 
Signed-off-by: Martin Leung 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 35a27fe48f66..b37c4d2e7a1e 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1377,10 +1377,6 @@ bool dc_validate_seamless_boot_timing(const struct dc 
*dc,
if (!link->link_enc->funcs->is_dig_enabled(link->link_enc))
return false;
 
-   /* Check for FEC status*/
-   if (link->link_enc->funcs->fec_is_active(link->link_enc))
-   return false;
-
enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc);
 
if (enc_inst == ENGINE_ID_UNKNOWN)
-- 
2.35.1

[PATCH AUTOSEL 5.15 29/41] drm/amd/display: Enable power gating before init_pipes

2022-04-11 Thread Sasha Levin

From: Roman Li 

[ Upstream commit 58e16c752e9540b28a873c44c3bee83e022007c1 ]

[Why]
In init_hw() we call init_pipes() before enabling power gating.
init_pipes() tries to power gate dsc but it may fail because
required force-ons are not released yet.
As a result with dsc config the following errors observed on resume:
"REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control"
"REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control"
"REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control"

[How]
Move enable_power_gating_plane() before init_pipes() in init_hw()

Reviewed-by: Anthony Koo 
Reviewed-by: Eric Yang 
Acked-by: Alex Hung 
Signed-off-by: Roman Li 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 5 +++--
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c| 5 +++--
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c| 5 +++--
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index 3af49cdf89eb..174dd149fee7 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -1435,6 +1435,9 @@ void dcn10_init_hw(struct dc *dc)
}
}
 
+   if (hws->funcs.enable_power_gating_plane)
+   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
+
/* If taking control over from VBIOS, we may want to optimize our first
 * mode set, so we need to skip powering down pipes until we know which
 * pipes we want to use.
@@ -1487,8 +1490,6 @@ void dcn10_init_hw(struct dc *dc)
 
REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0);
}
-   if (hws->funcs.enable_power_gating_plane)
-   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
 
if (dc->clk_mgr->funcs->notify_wm_ranges)
dc->clk_mgr->funcs->notify_wm_ranges(dc->clk_mgr);
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
index 0950784bafa4..f83457375811 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
@@ -570,6 +570,9 @@ void dcn30_init_hw(struct dc *dc)
}
}
 
+   if (hws->funcs.enable_power_gating_plane)
+   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
+
/* If taking control over from VBIOS, we may want to optimize our first
 * mode set, so we need to skip powering down pipes until we know which
 * pipes we want to use.
@@ -647,8 +650,6 @@ void dcn30_init_hw(struct dc *dc)
 
REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0);
}
-   if (hws->funcs.enable_power_gating_plane)
-   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
 
if (!dcb->funcs->is_accelerated_mode(dcb) && 
dc->res_pool->hubbub->funcs->init_watermarks)

dc->res_pool->hubbub->funcs->init_watermarks(dc->res_pool->hubbub);
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
index 3afa1159a5f7..b72d080b302a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
@@ -204,6 +204,9 @@ void dcn31_init_hw(struct dc *dc)
}
}
 
+   if (hws->funcs.enable_power_gating_plane)
+   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
+
/* If taking control over from VBIOS, we may want to optimize our first
 * mode set, so we need to skip powering down pipes until we know which
 * pipes we want to use.
@@ -287,8 +290,6 @@ void dcn31_init_hw(struct dc *dc)
 
REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0);
}
-   if (hws->funcs.enable_power_gating_plane)
-   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
 
if (!dcb->funcs->is_accelerated_mode(dcb) && 
dc->res_pool->hubbub->funcs->init_watermarks)

dc->res_pool->hubbub->funcs->init_watermarks(dc->res_pool->hubbub);
-- 
2.35.1

[PATCH AUTOSEL 5.15 11/41] drm/amdkfd: Check for potential null return of kmalloc_array()

2022-04-11 Thread Sasha Levin

From: QintaoShen 

[ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ]

As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead 
to null-pointer dereference.
Therefore, it is better to check the return value of kmalloc_array() to avoid 
this confusion.

Signed-off-by: QintaoShen 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 3eea4edee355..b8bdd796cd91 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -531,6 +531,8 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
event_waiters = kmalloc_array(num_events,
sizeof(struct kfd_event_waiter),
GFP_KERNEL);
+   if (!event_waiters)
+   return NULL;
 
for (i = 0; (event_waiters) && (i < num_events) ; i++) {
init_wait(&event_waiters[i].wait);
-- 
2.35.1

[PATCH AUTOSEL 5.15 10/41] drm/amdgpu/vcn: improve vcn dpg stop procedure

2022-04-11 Thread Sasha Levin

From: Tianci Yin 

[ Upstream commit 6ea239adc2a712eb318f04f5c29b018ba65ea38a ]

Prior to disabling dpg, VCN need unpausing dpg mode, or VCN will hang in
S3 resuming.

Reviewed-by: James Zhu 
Signed-off-by: Tianci Yin 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index 3d18aab88b4e..a026b2eaec21 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
@@ -1508,8 +1508,11 @@ static int vcn_v3_0_start_sriov(struct amdgpu_device 
*adev)
 
 static int vcn_v3_0_stop_dpg_mode(struct amdgpu_device *adev, int inst_idx)
 {
+   struct dpg_pause_state state = {.fw_based = VCN_DPG_STATE__UNPAUSE};
uint32_t tmp;
 
+   vcn_v3_0_pause_dpg_mode(adev, 0, &state);
+
/* Wait for power status to be 1 */
SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
-- 
2.35.1

[PATCH AUTOSEL 5.15 09/41] drm/amdkfd: Fix Incorrect VMIDs passed to HWS

2022-04-11 Thread Sasha Levin

From: Tushar Patel 

[ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ]

Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix
this by passing correct number of VMIDs to HWS

v2: squash in warning fix (Alex)

Signed-off-by: Tushar Patel 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 +++
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 5a7fef324c82..b517b76e96a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -680,7 +680,7 @@ MODULE_PARM_DESC(sched_policy,
  * Maximum number of processes that HWS can schedule concurrently. The maximum 
is the
  * number of VMIDs assigned to the HWS, which is also the default.
  */
-int hws_max_conc_proc = 8;
+int hws_max_conc_proc = -1;
 module_param(hws_max_conc_proc, int, 0444);
 MODULE_PARM_DESC(hws_max_conc_proc,
"Max # processes HWS can execute concurrently when sched_policy=0 (0 = 
no concurrency, #VMIDs for KFD = Maximum(default))");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 88c483f69989..660eb7097cfc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -834,15 +834,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
}
 
/* Verify module parameters regarding mapped process number*/
-   if ((hws_max_conc_proc < 0)
-   || (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) {
-   dev_err(kfd_device,
-   "hws_max_conc_proc %d must be between 0 and %d, use %d 
instead\n",
-   hws_max_conc_proc, kfd->vm_info.vmid_num_kfd,
-   kfd->vm_info.vmid_num_kfd);
+   if (hws_max_conc_proc >= 0)
+   kfd->max_proc_per_quantum = min((u32)hws_max_conc_proc, 
kfd->vm_info.vmid_num_kfd);
+   else
kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd;
-   } else
-   kfd->max_proc_per_quantum = hws_max_conc_proc;
 
/* calculate max size of mqds needed for queues */
size = max_num_of_queues_per_device *
-- 
2.35.1

[PATCH AUTOSEL 5.15 08/41] drm/amd/display: Update VTEM Infopacket definition

2022-04-11 Thread Sasha Levin

From: "Leo (Hanghong) Ma" 

[ Upstream commit c9fbf6435162ed5fb7201d1d4adf6585c6a8c327 ]

[Why & How]
The latest HDMI SPEC has updated the VTEM packet structure,
so change the VTEM Infopacket defined in the driver side to align
with the SPEC.

Reviewed-by: Chris Park 
Acked-by: Alex Hung 
Signed-off-by: Leo (Hanghong) Ma 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../gpu/drm/amd/display/modules/info_packet/info_packet.c| 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c 
b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
index 57f198de5e2c..4e075b01d48b 100644
--- a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
+++ b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
@@ -100,7 +100,8 @@ enum vsc_packet_revision {
 //PB7 = MD0
 #define MASK_VTEM_MD0__VRR_EN 0x01
 #define MASK_VTEM_MD0__M_CONST0x02
-#define MASK_VTEM_MD0__RESERVED2  0x0C
+#define MASK_VTEM_MD0__QMS_EN 0x04
+#define MASK_VTEM_MD0__RESERVED2  0x08
 #define MASK_VTEM_MD0__FVA_FACTOR_M1  0xF0
 
 //MD1
@@ -109,7 +110,7 @@ enum vsc_packet_revision {
 //MD2
 #define MASK_VTEM_MD2__BASE_REFRESH_RATE_98  0x03
 #define MASK_VTEM_MD2__RB0x04
-#define MASK_VTEM_MD2__RESERVED3 0xF8
+#define MASK_VTEM_MD2__NEXT_TFR  0xF8
 
 //MD3
 #define MASK_VTEM_MD3__BASE_REFRESH_RATE_07  0xFF
-- 
2.35.1

[PATCH AUTOSEL 5.15 07/41] drm/amd/display: FEC check in timing validation

2022-04-11 Thread Sasha Levin

From: Chiawen Huang 

[ Upstream commit 7d56a154e22ffb3613fdebf83ec34d5225a22993 ]

[Why]
disable/enable leads FEC mismatch between hw/sw FEC state.

[How]
check FEC status to fastboot on/off.

Reviewed-by: Anthony Koo 
Acked-by: Alex Hung 
Signed-off-by: Chiawen Huang 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index b37c4d2e7a1e..35a27fe48f66 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1377,6 +1377,10 @@ bool dc_validate_seamless_boot_timing(const struct dc 
*dc,
if (!link->link_enc->funcs->is_dig_enabled(link->link_enc))
return false;
 
+   /* Check for FEC status*/
+   if (link->link_enc->funcs->fec_is_active(link->link_enc))
+   return false;
+
enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc);
 
if (enc_inst == ENGINE_ID_UNKNOWN)
-- 
2.35.1

[PATCH AUTOSEL 5.15 06/41] drm/amd/display: fix audio format not updated after edid updated

2022-04-11 Thread Sasha Levin

From: Charlene Liu 

[ Upstream commit 5e8a71cf13bc9184fee915b2220be71b4c6cac74 ]

[why]
for the case edid change only changed audio format.
driver still need to update stream.

Reviewed-by: Alvin Lee 
Reviewed-by: Aric Cyr 
Acked-by: Alex Hung 
Signed-off-by: Charlene Liu 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 7ae409f7dcf8..8c285d585249 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1623,8 +1623,8 @@ bool dc_is_stream_unchanged(
if (old_stream->ignore_msa_timing_param != 
stream->ignore_msa_timing_param)
return false;
 
-   // Only Have Audio left to check whether it is same or not. This is a 
corner case for Tiled sinks
-   if (old_stream->audio_info.mode_count != stream->audio_info.mode_count)
+   /*compare audio info*/
+   if (memcmp(&old_stream->audio_info, &stream->audio_info, 
sizeof(stream->audio_info)) != 0)
return false;
 
return true;
-- 
2.35.1

[PATCH AUTOSEL 5.15 05/41] drm/amdgpu/gmc: use PCI BARs for APUs in passthrough

2022-04-11 Thread Sasha Levin

From: Alex Deucher 

[ Upstream commit b818a5d374542ccec73dcfe578a081574029820e ]

If the GPU is passed through to a guest VM, use the PCI
BAR for CPU FB access rather than the physical address of
carve out.  The physical address is not valid in a guest.

v2: Fix HDP handing as suggested by Michel

Reviewed-by: Christian König 
Reviewed-by: Michel Dänzer 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  | 5 +++--
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 2 +-
 5 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 33026b3eafd2..2f2ae26a8068 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5625,7 +5625,7 @@ void amdgpu_device_flush_hdp(struct amdgpu_device *adev,
struct amdgpu_ring *ring)
 {
 #ifdef CONFIG_X86_64
-   if (adev->flags & AMD_IS_APU)
+   if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev))
return;
 #endif
if (adev->gmc.xgmi.connected_to_cpu)
@@ -5641,7 +5641,7 @@ void amdgpu_device_invalidate_hdp(struct amdgpu_device 
*adev,
struct amdgpu_ring *ring)
 {
 #ifdef CONFIG_X86_64
-   if (adev->flags & AMD_IS_APU)
+   if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev))
return;
 #endif
if (adev->gmc.xgmi.connected_to_cpu)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index 3c01be661014..93a4da4284ed 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -788,7 +788,7 @@ static int gmc_v10_0_mc_init(struct amdgpu_device *adev)
adev->gmc.aper_size = pci_resource_len(adev->pdev, 0);
 
 #ifdef CONFIG_X86_64
-   if (adev->flags & AMD_IS_APU) {
+   if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) {
adev->gmc.aper_base = 
adev->gfxhub.funcs->get_mc_fb_offset(adev);
adev->gmc.aper_size = adev->gmc.real_vram_size;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 0a50fdaced7e..63c47f61d0df 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -381,8 +381,9 @@ static int gmc_v7_0_mc_init(struct amdgpu_device *adev)
adev->gmc.aper_size = pci_resource_len(adev->pdev, 0);
 
 #ifdef CONFIG_X86_64
-   if (adev->flags & AMD_IS_APU &&
-   adev->gmc.real_vram_size > adev->gmc.aper_size) {
+   if ((adev->flags & AMD_IS_APU) &&
+   adev->gmc.real_vram_size > adev->gmc.aper_size &&
+   !amdgpu_passthrough(adev)) {
adev->gmc.aper_base = ((u64)RREG32(mmMC_VM_FB_OFFSET)) << 22;
adev->gmc.aper_size = adev->gmc.real_vram_size;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 63b890f1e8af..bef9610084f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -581,7 +581,7 @@ static int gmc_v8_0_mc_init(struct amdgpu_device *adev)
adev->gmc.aper_size = pci_resource_len(adev->pdev, 0);
 
 #ifdef CONFIG_X86_64
-   if (adev->flags & AMD_IS_APU) {
+   if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) {
adev->gmc.aper_base = ((u64)RREG32(mmMC_VM_FB_OFFSET)) << 22;
adev->gmc.aper_size = adev->gmc.real_vram_size;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 6dc16ccf6c81..0e731016921b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1387,7 +1387,7 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev)
 */
 
/* check whether both host-gpu and gpu-gpu xgmi links exist */
-   if ((adev->flags & AMD_IS_APU) ||
+   if (((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) ||
(adev->gmc.xgmi.supported &&
 adev->gmc.xgmi.connected_to_cpu)) {
adev->gmc.aper_base =
-- 
2.35.1

[PATCH AUTOSEL 5.15 04/41] drm/amdgpu: conduct a proper cleanup of PDB bo

2022-04-11 Thread Sasha Levin

From: Guchun Chen 

[ Upstream commit 2d505453f38e18d42ba7d5428aaa17aaa7752c65 ]

Use amdgpu_bo_free_kernel instead of amdgpu_bo_unref to
perform a proper cleanup of PDB bo.

v2: update subject to be more accurate

Signed-off-by: Guchun Chen 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index c67e21244342..6dc16ccf6c81 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1652,7 +1652,7 @@ static int gmc_v9_0_sw_fini(void *handle)
amdgpu_gem_force_release(adev);
amdgpu_vm_manager_fini(adev);
amdgpu_gart_table_vram_free(adev);
-   amdgpu_bo_unref(&adev->gmc.pdb0_bo);
+   amdgpu_bo_free_kernel(&adev->gmc.pdb0_bo, NULL, &adev->gmc.ptr_pdb0);
amdgpu_bo_fini(adev);
 
return 0;
-- 
2.35.1

[PATCH AUTOSEL 5.15 01/41] drm/amd: Add USBC connector ID

2022-04-11 Thread Sasha Levin

From: Aurabindo Pillai 

[ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ]

[Why&How] Add a dedicated AMDGPU specific ID for use with
newer ASICs that support USB-C output

Signed-off-by: Aurabindo Pillai 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h 
b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
index 5b393622f592..a0f0a17e224f 100644
--- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h
+++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
@@ -119,6 +119,7 @@
 #define CONNECTOR_OBJECT_ID_eDP   0x14
 #define CONNECTOR_OBJECT_ID_MXM   0x15
 #define CONNECTOR_OBJECT_ID_LVDS_eDP  0x16
+#define CONNECTOR_OBJECT_ID_USBC  0x17
 
 /* deleted */
 
-- 
2.35.1

[PATCH AUTOSEL 5.17 38/49] drm/amd/display: Fix allocate_mst_payload assert on resume

2022-04-11 Thread Sasha Levin

From: Roman Li 

[ Upstream commit f4346fb3edf7720db3f7f5e1cab1f667cd024280 ]

[Why]
On resume we do link detection for all non-MST connectors.
MST is handled separately. However the condition for telling
if connector is on mst branch is not enough for mst hub case.
Link detection for mst branch link leads to mst topology reset.
That causes assert in dc_link_allocate_mst_payload()

[How]
Use link type as indicator for mst link.

Reviewed-by: Wayne Lin 
Acked-by: Alex Hung 
Signed-off-by: Roman Li 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index b28b5c490860..4ba173b47617 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2693,7 +2693,8 @@ static int dm_resume(void *handle)
 * this is the case when traversing through already created
 * MST connectors, should be skipped
 */
-   if (aconnector->mst_port)
+   if (aconnector->dc_link &&
+   aconnector->dc_link->type == dc_connection_mst_branch)
continue;
 
mutex_lock(&aconnector->hpd_lock);
-- 
2.35.1

[PATCH AUTOSEL 5.17 37/49] drm/amd/display: Revert FEC check in validation

2022-04-11 Thread Sasha Levin

From: Martin Leung 

[ Upstream commit b2075fce104b88b789c15ef1ed2b91dc94198e26 ]

why and how:
causes failure on install on certain machines

Reviewed-by: George Shen 
Acked-by: Alex Hung 
Signed-off-by: Martin Leung 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index a62711d71d0e..ba1aa994db4b 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1493,10 +1493,6 @@ bool dc_validate_seamless_boot_timing(const struct dc 
*dc,
if (!link->link_enc->funcs->is_dig_enabled(link->link_enc))
return false;
 
-   /* Check for FEC status*/
-   if (link->link_enc->funcs->fec_is_active(link->link_enc))
-   return false;
-
enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc);
 
if (enc_inst == ENGINE_ID_UNKNOWN)
-- 
2.35.1

[PATCH AUTOSEL 5.17 36/49] drm/amd/display: Enable power gating before init_pipes

2022-04-11 Thread Sasha Levin

From: Roman Li 

[ Upstream commit 58e16c752e9540b28a873c44c3bee83e022007c1 ]

[Why]
In init_hw() we call init_pipes() before enabling power gating.
init_pipes() tries to power gate dsc but it may fail because
required force-ons are not released yet.
As a result with dsc config the following errors observed on resume:
"REG_WAIT timeout 1us * 1000 tries - dcn20_dsc_pg_control"
"REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control"
"REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control"

[How]
Move enable_power_gating_plane() before init_pipes() in init_hw()

Reviewed-by: Anthony Koo 
Reviewed-by: Eric Yang 
Acked-by: Alex Hung 
Signed-off-by: Roman Li 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 5 +++--
 drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c| 5 +++--
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c| 5 +++--
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index 530a72e3eefe..39d5ceb9a373 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -1508,6 +1508,9 @@ void dcn10_init_hw(struct dc *dc)
if (dc->config.power_down_display_on_boot)
dc_link_blank_all_dp_displays(dc);
 
+   if (hws->funcs.enable_power_gating_plane)
+   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
+
/* If taking control over from VBIOS, we may want to optimize our first
 * mode set, so we need to skip powering down pipes until we know which
 * pipes we want to use.
@@ -1560,8 +1563,6 @@ void dcn10_init_hw(struct dc *dc)
 
REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0);
}
-   if (hws->funcs.enable_power_gating_plane)
-   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
 
if (dc->clk_mgr->funcs->notify_wm_ranges)
dc->clk_mgr->funcs->notify_wm_ranges(dc->clk_mgr);
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
index 1db1ca19411d..05dc0a3ae2a3 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
@@ -548,6 +548,9 @@ void dcn30_init_hw(struct dc *dc)
if (dc->config.power_down_display_on_boot)
dc_link_blank_all_dp_displays(dc);
 
+   if (hws->funcs.enable_power_gating_plane)
+   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
+
/* If taking control over from VBIOS, we may want to optimize our first
 * mode set, so we need to skip powering down pipes until we know which
 * pipes we want to use.
@@ -625,8 +628,6 @@ void dcn30_init_hw(struct dc *dc)
 
REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0);
}
-   if (hws->funcs.enable_power_gating_plane)
-   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
 
if (!dcb->funcs->is_accelerated_mode(dcb) && 
dc->res_pool->hubbub->funcs->init_watermarks)

dc->res_pool->hubbub->funcs->init_watermarks(dc->res_pool->hubbub);
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
index 4206ce5bf9a9..1d2682104670 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c
@@ -200,6 +200,9 @@ void dcn31_init_hw(struct dc *dc)
if (dc->config.power_down_display_on_boot)
dc_link_blank_all_dp_displays(dc);
 
+   if (hws->funcs.enable_power_gating_plane)
+   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
+
/* If taking control over from VBIOS, we may want to optimize our first
 * mode set, so we need to skip powering down pipes until we know which
 * pipes we want to use.
@@ -249,8 +252,6 @@ void dcn31_init_hw(struct dc *dc)
 
REG_UPDATE(DCFCLK_CNTL, DCFCLK_GATE_DIS, 0);
}
-   if (hws->funcs.enable_power_gating_plane)
-   hws->funcs.enable_power_gating_plane(dc->hwseq, true);
 
if (!dcb->funcs->is_accelerated_mode(dcb) && 
dc->res_pool->hubbub->funcs->init_watermarks)

dc->res_pool->hubbub->funcs->init_watermarks(dc->res_pool->hubbub);
-- 
2.35.1

[PATCH AUTOSEL 5.17 35/49] drm/amd/display: Correct Slice reset calculation

2022-04-11 Thread Sasha Levin

From: Chris Park 

[ Upstream commit 862a876c3a6372f2fa9d0c6510f1976ac94fc857 ]

[Why]
Once DSC slice cannot fit pixel clock, we incorrectly
reset min slices to 0 and allow max slice to operate,
even when max slice itself cannot fit the pixel clock
properly.

[How]
Change the sequence such that we correctly determine
DSC is not possible when both min slices and max
slices cannot fit pixel clock per slice.

Reviewed-by: Wenjing Liu 
Acked-by: Alex Hung 
Signed-off-by: Chris Park 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c 
b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
index 9c74564cbd8d..8973d3a38f9c 100644
--- a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
+++ b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
@@ -864,11 +864,11 @@ static bool setup_dsc_config(
min_slices_h = inc_num_slices(dsc_common_caps.slice_caps, 
min_slices_h);
}
 
+   is_dsc_possible = (min_slices_h <= max_slices_h);
+
if (pic_width % min_slices_h != 0)
min_slices_h = 0; // DSC TODO: Maybe try increasing the number 
of slices first?
 
-   is_dsc_possible = (min_slices_h <= max_slices_h);
-
if (min_slices_h == 0 && max_slices_h == 0)
is_dsc_possible = false;
 
-- 
2.35.1

[PATCH AUTOSEL 5.17 12/49] drm/amdkfd: Check for potential null return of kmalloc_array()

2022-04-11 Thread Sasha Levin

From: QintaoShen 

[ Upstream commit ebbb7bb9e80305820dc2328a371c1b35679f2667 ]

As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead 
to null-pointer dereference.
Therefore, it is better to check the return value of kmalloc_array() to avoid 
this confusion.

Signed-off-by: QintaoShen 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index afe72dd11325..6ca7e12bdab8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -531,6 +531,8 @@ static struct kfd_event_waiter 
*alloc_event_waiters(uint32_t num_events)
event_waiters = kmalloc_array(num_events,
sizeof(struct kfd_event_waiter),
GFP_KERNEL);
+   if (!event_waiters)
+   return NULL;
 
for (i = 0; (event_waiters) && (i < num_events) ; i++) {
init_wait(&event_waiters[i].wait);
-- 
2.35.1

[PATCH AUTOSEL 5.17 11/49] drm/amdgpu/vcn: improve vcn dpg stop procedure

2022-04-11 Thread Sasha Levin

From: Tianci Yin 

[ Upstream commit 6ea239adc2a712eb318f04f5c29b018ba65ea38a ]

Prior to disabling dpg, VCN need unpausing dpg mode, or VCN will hang in
S3 resuming.

Reviewed-by: James Zhu 
Signed-off-by: Tianci Yin 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index da11ceba0698..2ae56b2ed318 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
@@ -1474,8 +1474,11 @@ static int vcn_v3_0_start_sriov(struct amdgpu_device 
*adev)
 
 static int vcn_v3_0_stop_dpg_mode(struct amdgpu_device *adev, int inst_idx)
 {
+   struct dpg_pause_state state = {.fw_based = VCN_DPG_STATE__UNPAUSE};
uint32_t tmp;
 
+   vcn_v3_0_pause_dpg_mode(adev, 0, &state);
+
/* Wait for power status to be 1 */
SOC15_WAIT_ON_RREG(VCN, inst_idx, mmUVD_POWER_STATUS, 1,
UVD_POWER_STATUS__UVD_POWER_STATUS_MASK);
-- 
2.35.1

[PATCH AUTOSEL 5.17 10/49] drm/amdkfd: Fix Incorrect VMIDs passed to HWS

2022-04-11 Thread Sasha Levin

From: Tushar Patel 

[ Upstream commit b7dfbd2e601f3fee545bc158feceba4f340fe7cf ]

Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix
this by passing correct number of VMIDs to HWS

v2: squash in warning fix (Alex)

Signed-off-by: Tushar Patel 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 11 +++
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 0ead08ba58c2..7672790d8b6b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -686,7 +686,7 @@ MODULE_PARM_DESC(sched_policy,
  * Maximum number of processes that HWS can schedule concurrently. The maximum 
is the
  * number of VMIDs assigned to the HWS, which is also the default.
  */
-int hws_max_conc_proc = 8;
+int hws_max_conc_proc = -1;
 module_param(hws_max_conc_proc, int, 0444);
 MODULE_PARM_DESC(hws_max_conc_proc,
"Max # processes HWS can execute concurrently when sched_policy=0 (0 = 
no concurrency, #VMIDs for KFD = Maximum(default))");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 2b65d0acae2c..2fdbe2f475e4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -480,15 +480,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
}
 
/* Verify module parameters regarding mapped process number*/
-   if ((hws_max_conc_proc < 0)
-   || (hws_max_conc_proc > kfd->vm_info.vmid_num_kfd)) {
-   dev_err(kfd_device,
-   "hws_max_conc_proc %d must be between 0 and %d, use %d 
instead\n",
-   hws_max_conc_proc, kfd->vm_info.vmid_num_kfd,
-   kfd->vm_info.vmid_num_kfd);
+   if (hws_max_conc_proc >= 0)
+   kfd->max_proc_per_quantum = min((u32)hws_max_conc_proc, 
kfd->vm_info.vmid_num_kfd);
+   else
kfd->max_proc_per_quantum = kfd->vm_info.vmid_num_kfd;
-   } else
-   kfd->max_proc_per_quantum = hws_max_conc_proc;
 
/* calculate max size of mqds needed for queues */
size = max_num_of_queues_per_device *
-- 
2.35.1

[PATCH AUTOSEL 5.17 09/49] drm/amd/display: Update VTEM Infopacket definition

2022-04-11 Thread Sasha Levin

From: "Leo (Hanghong) Ma" 

[ Upstream commit c9fbf6435162ed5fb7201d1d4adf6585c6a8c327 ]

[Why & How]
The latest HDMI SPEC has updated the VTEM packet structure,
so change the VTEM Infopacket defined in the driver side to align
with the SPEC.

Reviewed-by: Chris Park 
Acked-by: Alex Hung 
Signed-off-by: Leo (Hanghong) Ma 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 .../gpu/drm/amd/display/modules/info_packet/info_packet.c| 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c 
b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
index 57f198de5e2c..4e075b01d48b 100644
--- a/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
+++ b/drivers/gpu/drm/amd/display/modules/info_packet/info_packet.c
@@ -100,7 +100,8 @@ enum vsc_packet_revision {
 //PB7 = MD0
 #define MASK_VTEM_MD0__VRR_EN 0x01
 #define MASK_VTEM_MD0__M_CONST0x02
-#define MASK_VTEM_MD0__RESERVED2  0x0C
+#define MASK_VTEM_MD0__QMS_EN 0x04
+#define MASK_VTEM_MD0__RESERVED2  0x08
 #define MASK_VTEM_MD0__FVA_FACTOR_M1  0xF0
 
 //MD1
@@ -109,7 +110,7 @@ enum vsc_packet_revision {
 //MD2
 #define MASK_VTEM_MD2__BASE_REFRESH_RATE_98  0x03
 #define MASK_VTEM_MD2__RB0x04
-#define MASK_VTEM_MD2__RESERVED3 0xF8
+#define MASK_VTEM_MD2__NEXT_TFR  0xF8
 
 //MD3
 #define MASK_VTEM_MD3__BASE_REFRESH_RATE_07  0xFF
-- 
2.35.1

[PATCH AUTOSEL 5.17 08/49] drm/amd/display: FEC check in timing validation

2022-04-11 Thread Sasha Levin

From: Chiawen Huang 

[ Upstream commit 7d56a154e22ffb3613fdebf83ec34d5225a22993 ]

[Why]
disable/enable leads FEC mismatch between hw/sw FEC state.

[How]
check FEC status to fastboot on/off.

Reviewed-by: Anthony Koo 
Acked-by: Alex Hung 
Signed-off-by: Chiawen Huang 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index ba1aa994db4b..a62711d71d0e 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -1493,6 +1493,10 @@ bool dc_validate_seamless_boot_timing(const struct dc 
*dc,
if (!link->link_enc->funcs->is_dig_enabled(link->link_enc))
return false;
 
+   /* Check for FEC status*/
+   if (link->link_enc->funcs->fec_is_active(link->link_enc))
+   return false;
+
enc_inst = link->link_enc->funcs->get_dig_frontend(link->link_enc);
 
if (enc_inst == ENGINE_ID_UNKNOWN)
-- 
2.35.1

[PATCH AUTOSEL 5.17 07/49] drm/amd/display: fix audio format not updated after edid updated

2022-04-11 Thread Sasha Levin

From: Charlene Liu 

[ Upstream commit 5e8a71cf13bc9184fee915b2220be71b4c6cac74 ]

[why]
for the case edid change only changed audio format.
driver still need to update stream.

Reviewed-by: Alvin Lee 
Reviewed-by: Aric Cyr 
Acked-by: Alex Hung 
Signed-off-by: Charlene Liu 
Tested-by: Daniel Wheeler 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index 18757c158523..bb8a762cf8da 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1664,8 +1664,8 @@ bool dc_is_stream_unchanged(
if (old_stream->ignore_msa_timing_param != 
stream->ignore_msa_timing_param)
return false;
 
-   // Only Have Audio left to check whether it is same or not. This is a 
corner case for Tiled sinks
-   if (old_stream->audio_info.mode_count != stream->audio_info.mode_count)
+   /*compare audio info*/
+   if (memcmp(&old_stream->audio_info, &stream->audio_info, 
sizeof(stream->audio_info)) != 0)
return false;
 
return true;
-- 
2.35.1

[PATCH AUTOSEL 5.17 06/49] drm/amdgpu/gmc: use PCI BARs for APUs in passthrough

2022-04-11 Thread Sasha Levin

From: Alex Deucher 

[ Upstream commit b818a5d374542ccec73dcfe578a081574029820e ]

If the GPU is passed through to a guest VM, use the PCI
BAR for CPU FB access rather than the physical address of
carve out.  The physical address is not valid in a guest.

v2: Fix HDP handing as suggested by Michel

Reviewed-by: Christian König 
Reviewed-by: Michel Dänzer 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  | 5 +++--
 drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 2 +-
 5 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f18c698137a6..b590795c1bc4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5678,7 +5678,7 @@ void amdgpu_device_flush_hdp(struct amdgpu_device *adev,
struct amdgpu_ring *ring)
 {
 #ifdef CONFIG_X86_64
-   if (adev->flags & AMD_IS_APU)
+   if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev))
return;
 #endif
if (adev->gmc.xgmi.connected_to_cpu)
@@ -5694,7 +5694,7 @@ void amdgpu_device_invalidate_hdp(struct amdgpu_device 
*adev,
struct amdgpu_ring *ring)
 {
 #ifdef CONFIG_X86_64
-   if (adev->flags & AMD_IS_APU)
+   if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev))
return;
 #endif
if (adev->gmc.xgmi.connected_to_cpu)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index a2f8ed0e6a64..f1b794d5d87d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -788,7 +788,7 @@ static int gmc_v10_0_mc_init(struct amdgpu_device *adev)
adev->gmc.aper_size = pci_resource_len(adev->pdev, 0);
 
 #ifdef CONFIG_X86_64
-   if (adev->flags & AMD_IS_APU) {
+   if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) {
adev->gmc.aper_base = 
adev->gfxhub.funcs->get_mc_fb_offset(adev);
adev->gmc.aper_size = adev->gmc.real_vram_size;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index ab8adbff9e2d..5206e2da334a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -381,8 +381,9 @@ static int gmc_v7_0_mc_init(struct amdgpu_device *adev)
adev->gmc.aper_size = pci_resource_len(adev->pdev, 0);
 
 #ifdef CONFIG_X86_64
-   if (adev->flags & AMD_IS_APU &&
-   adev->gmc.real_vram_size > adev->gmc.aper_size) {
+   if ((adev->flags & AMD_IS_APU) &&
+   adev->gmc.real_vram_size > adev->gmc.aper_size &&
+   !amdgpu_passthrough(adev)) {
adev->gmc.aper_base = ((u64)RREG32(mmMC_VM_FB_OFFSET)) << 22;
adev->gmc.aper_size = adev->gmc.real_vram_size;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 054733838292..d07d36786836 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -581,7 +581,7 @@ static int gmc_v8_0_mc_init(struct amdgpu_device *adev)
adev->gmc.aper_size = pci_resource_len(adev->pdev, 0);
 
 #ifdef CONFIG_X86_64
-   if (adev->flags & AMD_IS_APU) {
+   if ((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) {
adev->gmc.aper_base = ((u64)RREG32(mmMC_VM_FB_OFFSET)) << 22;
adev->gmc.aper_size = adev->gmc.real_vram_size;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 34ee75cf7954..2fb24178eaef 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1420,7 +1420,7 @@ static int gmc_v9_0_mc_init(struct amdgpu_device *adev)
 */
 
/* check whether both host-gpu and gpu-gpu xgmi links exist */
-   if ((adev->flags & AMD_IS_APU) ||
+   if (((adev->flags & AMD_IS_APU) && !amdgpu_passthrough(adev)) ||
(adev->gmc.xgmi.supported &&
 adev->gmc.xgmi.connected_to_cpu)) {
adev->gmc.aper_base =
-- 
2.35.1

[PATCH AUTOSEL 5.17 05/49] drm/amdgpu: conduct a proper cleanup of PDB bo

2022-04-11 Thread Sasha Levin

From: Guchun Chen 

[ Upstream commit 2d505453f38e18d42ba7d5428aaa17aaa7752c65 ]

Use amdgpu_bo_free_kernel instead of amdgpu_bo_unref to
perform a proper cleanup of PDB bo.

v2: update subject to be more accurate

Signed-off-by: Guchun Chen 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 88c1eb9ad068..34ee75cf7954 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1684,7 +1684,7 @@ static int gmc_v9_0_sw_fini(void *handle)
amdgpu_gem_force_release(adev);
amdgpu_vm_manager_fini(adev);
amdgpu_gart_table_vram_free(adev);
-   amdgpu_bo_unref(&adev->gmc.pdb0_bo);
+   amdgpu_bo_free_kernel(&adev->gmc.pdb0_bo, NULL, &adev->gmc.ptr_pdb0);
amdgpu_bo_fini(adev);
 
return 0;
-- 
2.35.1

[PATCH AUTOSEL 5.17 02/49] drm/amd: Add USBC connector ID

2022-04-11 Thread Sasha Levin

From: Aurabindo Pillai 

[ Upstream commit c5c948aa894a831f96fccd025e47186b1ee41615 ]

[Why&How] Add a dedicated AMDGPU specific ID for use with
newer ASICs that support USB-C output

Signed-off-by: Aurabindo Pillai 
Reviewed-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/ObjectID.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/ObjectID.h 
b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
index 5b393622f592..a0f0a17e224f 100644
--- a/drivers/gpu/drm/amd/amdgpu/ObjectID.h
+++ b/drivers/gpu/drm/amd/amdgpu/ObjectID.h
@@ -119,6 +119,7 @@
 #define CONNECTOR_OBJECT_ID_eDP   0x14
 #define CONNECTOR_OBJECT_ID_MXM   0x15
 #define CONNECTOR_OBJECT_ID_LVDS_eDP  0x16
+#define CONNECTOR_OBJECT_ID_USBC  0x17
 
 /* deleted */
 
-- 
2.35.1

Re: [PATCH] drm/amdkfd: Cleanup IO links during KFD device removal

2022-04-11 Thread Felix Kuehling


Am 2022-04-07 um 12:15 schrieb Mukul Joshi:

Currently, the IO-links to the device being removed from topology,
are not cleared. As a result, there would be dangling links left in
the KFD topology. This patch aims to fix the following:
1. Cleanup all IO links to the device being removed.
2. Ensure that node numbering in sysfs and nodes proximity domain
values are consistent after the device is removed:
a. Adding a device and removing a GPU device are made mutually
   exclusive.
b. The global proximity domain counter is no longer required to be
   an atomic counter. A normal 32-bit counter can be used instead.
3. Update generation_count to let user-mode know that topology has
changed due to device removal.

CC: Shuotao Xu 
Signed-off-by: Mukul Joshi 


Looks good to me. I have two nit-picks inline.



---
  drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  4 +-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 79 ---
  3 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 1eaabd2cb41b..afc8a7fcdad8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct 
crat_subtype_iolink *iolink,
 * table, add corresponded reversed direction link now.
 */
if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL)) {
-   to_dev = kfd_topology_device_by_proximity_domain(id_to);
+   to_dev = kfd_topology_device_by_proximity_domain_no_lock(id_to);
if (!to_dev)
return -ENODEV;
/* same everything but the other direction */
@@ -2225,7 +2225,7 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 */
if (kdev->hive_id) {
for (nid = 0; nid < proximity_domain; ++nid) {
-   peer_dev = kfd_topology_device_by_proximity_domain(nid);
+   peer_dev = 
kfd_topology_device_by_proximity_domain_no_lock(nid);
if (!peer_dev->gpu)
continue;
if (peer_dev->gpu->hive_id != kdev->hive_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index e1b7e6afa920..8a43def1f638 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu);
  int kfd_topology_remove_device(struct kfd_dev *gpu);
  struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
uint32_t proximity_domain);
+struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock(
+   uint32_t proximity_domain);
  struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id);
  struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
  struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3bdcae239bc0..874a273b81f7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -46,27 +46,38 @@ static struct list_head topology_device_list;
  static struct kfd_system_properties sys_props;
  
  static DECLARE_RWSEM(topology_lock);

-static atomic_t topology_crat_proximity_domain;
+static uint32_t topology_crat_proximity_domain;
  
-struct kfd_topology_device *kfd_topology_device_by_proximity_domain(

+struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock(
uint32_t proximity_domain)


I remember we discussed this and I suggested splitting a no_lock version 
out of this function. But now I don't see it being used anywhere. Was 
that lost somewhere in refactoring or porting to the upstream branch? 
Maybe the no_lock version isn't needed any more.




  {
struct kfd_topology_device *top_dev;
struct kfd_topology_device *device = NULL;
  
-	down_read(&topology_lock);

-
list_for_each_entry(top_dev, &topology_device_list, list)
if (top_dev->proximity_domain == proximity_domain) {
device = top_dev;
break;
}
  
+	return device;

+}
+
+struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
+   uint32_t proximity_domain)
+{
+   struct kfd_topology_device *device = NULL;
+
+   down_read(&topology_lock);
+
+   device = kfd_topology_device_by_proximity_domain_no_lock(
+   proximity_domain);
up_read(&topology_lock);
  
  	return device;

  }
  
+

  s

Re: [PATCH 1/2] drm/amdkfd: Cleanup IO links during KFD device removal

2022-04-11 Thread Felix Kuehling


Am 2022-04-08 um 04:45 schrieb Shuotao Xu:

Currently, the IO-links to the device being removed from topology,
are not cleared. As a result, there would be dangling links left in
the KFD topology. This patch aims to fix the following:
1. Cleanup all IO links to the device being removed.
2. Ensure that node numbering in sysfs and nodes proximity domain
values are consistent after the device is removed:
a. Adding a device and removing a GPU device are made mutually
   exclusive.
b. The global proximity domain counter is no longer required to be
   an atomic counter. A normal 32-bit counter can be used instead.
3. Update generation_count to let user-mode know that topology has
changed due to device removal.

Reviewed-by: Shuotao Xu 
CC: Shuotao Xu 
Signed-off-by: Mukul Joshi 


This looks like Mukul's patch, but with you as the author (otherwise I 
would have expected a "From: Mukul ..." line at the start of the email). 
Did you make any changes to it?


Regards,
  Felix



---
  drivers/gpu/drm/amd/amdkfd/kfd_crat.c |  4 +-
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h |  2 +
  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 79 ---
  3 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 1eaabd2cb41b..afc8a7fcdad8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1056,7 +1056,7 @@ static int kfd_parse_subtype_iolink(struct 
crat_subtype_iolink *iolink,
 * table, add corresponded reversed direction link now.
 */
if (props && (iolink->flags & CRAT_IOLINK_FLAGS_BI_DIRECTIONAL)) {
-   to_dev = kfd_topology_device_by_proximity_domain(id_to);
+   to_dev = kfd_topology_device_by_proximity_domain_no_lock(id_to);
if (!to_dev)
return -ENODEV;
/* same everything but the other direction */
@@ -2225,7 +2225,7 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 */
if (kdev->hive_id) {
for (nid = 0; nid < proximity_domain; ++nid) {
-   peer_dev = kfd_topology_device_by_proximity_domain(nid);
+   peer_dev = 
kfd_topology_device_by_proximity_domain_no_lock(nid);
if (!peer_dev->gpu)
continue;
if (peer_dev->gpu->hive_id != kdev->hive_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index e1b7e6afa920..8a43def1f638 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1016,6 +1016,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu);
  int kfd_topology_remove_device(struct kfd_dev *gpu);
  struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
uint32_t proximity_domain);
+struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock(
+   uint32_t proximity_domain);
  struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id);
  struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
  struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 3bdcae239bc0..874a273b81f7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -46,27 +46,38 @@ static struct list_head topology_device_list;
  static struct kfd_system_properties sys_props;
  
  static DECLARE_RWSEM(topology_lock);

-static atomic_t topology_crat_proximity_domain;
+static uint32_t topology_crat_proximity_domain;
  
-struct kfd_topology_device *kfd_topology_device_by_proximity_domain(

+struct kfd_topology_device *kfd_topology_device_by_proximity_domain_no_lock(
uint32_t proximity_domain)
  {
struct kfd_topology_device *top_dev;
struct kfd_topology_device *device = NULL;
  
-	down_read(&topology_lock);

-
list_for_each_entry(top_dev, &topology_device_list, list)
if (top_dev->proximity_domain == proximity_domain) {
device = top_dev;
break;
}
  
+	return device;

+}
+
+struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
+   uint32_t proximity_domain)
+{
+   struct kfd_topology_device *device = NULL;
+
+   down_read(&topology_lock);
+
+   device = kfd_topology_device_by_proximity_domain_no_lock(
+   proximity_domain);
up_read(&topology_lock);
  
  	return device;

  }
  
+

  struct kfd_topology_device *kfd_topology_device_by_id(uint32_t gpu_id)
  {
struct kfd_topo

Re: [PATCHv2] drm/amdgpu: disable ASPM on Intel AlderLake based systems

2022-04-11 Thread kernel test robot

Hi Richard,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on drm/drm-next]
[also build test ERROR on v5.18-rc2 next-20220411]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/intel-lab-lkp/linux/commits/Richard-Gong/drm-amdgpu-disable-ASPM-on-Intel-AlderLake-based-systems/20220409-030656
base:   git://anongit.freedesktop.org/drm/drm drm-next
config: um-allmodconfig 
(https://download.01.org/0day-ci/archive/20220412/202204120618.4i6douw9-...@intel.com/config)
compiler: gcc-11 (Debian 11.2.0-19) 11.2.0
reproduce (this is a W=1 build):
# 
https://github.com/intel-lab-lkp/linux/commit/bb9a037cafa91918c2ece823591d1d04b812ae17
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Richard-Gong/drm-amdgpu-disable-ASPM-on-Intel-AlderLake-based-systems/20220409-030656
git checkout bb9a037cafa91918c2ece823591d1d04b812ae17
# save the config file to linux build tree
mkdir build_dir
make W=1 O=build_dir ARCH=um SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from arch/x86/um/asm/processor.h:41,
from include/linux/mutex.h:19,
from include/linux/kernfs.h:11,
from include/linux/sysfs.h:16,
from include/linux/kobject.h:20,
from include/linux/pci.h:35,
from drivers/gpu/drm/amd/amdgpu/vi.c:24:
   drivers/gpu/drm/amd/amdgpu/vi.c: In function 'intel_core_apsm_chk':
   arch/um/include/asm/processor-generic.h:103:19: error: called object is not 
a function or function pointer
 103 | #define cpu_data (&boot_cpu_data)
 |  ~^~~
   drivers/gpu/drm/amd/amdgpu/vi.c:1144:34: note: in expansion of macro 
'cpu_data'
1144 | struct cpuinfo_x86 *c = &cpu_data(0);
 |  ^~~~
>> drivers/gpu/drm/amd/amdgpu/vi.c:1146:18: error: invalid use of undefined 
>> type 'struct cpuinfo_x86'
1146 | return (c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
 |  ^~
   drivers/gpu/drm/amd/amdgpu/vi.c:1146:33: error: invalid use of undefined 
type 'struct cpuinfo_x86'
1146 | return (c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
 | ^~
   drivers/gpu/drm/amd/amdgpu/vi.c:1150:1: error: control reaches end of 
non-void function [-Werror=return-type]
1150 | }
 | ^
   cc1: some warnings being treated as errors


vim +1146 drivers/gpu/drm/amd/amdgpu/vi.c

  1140  
  1141  static bool intel_core_apsm_chk(void)
  1142  {
  1143  #if IS_ENABLED(CONFIG_X86_64)
  1144  struct cpuinfo_x86 *c = &cpu_data(0);
  1145  
> 1146  return (c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
  1147  #else
  1148  return false;
  1149  #endif
  1150  }
  1151  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

Re: AMDGPU: RX 6500 XT: System reset when loading module [SOLVED]

2022-04-11 Thread Cal Peake

Hi all,

I wanted to put a capper on this just in case anyone was interested, or in 
case any future people find this thread, because I did find a resolution.

Turns out the way to stop the system from crashing was to disable Global 
C-State Control in the BIOS.

Christian, Alex, you guys seem to have been on the right track in that it 
was something power related.

I haven't yet been able to figure out what Global C-State Control exactly 
does.

My best guess as to what was happening: once the GPU power management code 
was loaded and the GPU dropped into a very low power state, the CPU saw 
this and decided to match it, lowering its own power state to such a point 
that it hard resets the system. (Just a wild theory anyway :-)

If anyone knows what this feature really does, or has any better theories 
as to why it doesn't play nice with AMD GPUs, please do share!

Thanks and kind regards,

-- 
Cal Peake

gcc inserts __builtin_popcount, causes 'modpost: "__popcountdi2" ... amdgpu.ko] undefined'

2022-04-11 Thread Sergei Trofimovich

Current linux-5.17.1 on fresh gcc-12 fails to build with errors like:

ERROR: modpost: "__popcountdi2" 
[drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko] undefined!
ERROR: modpost: "__popcountdi2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] 
undefined!

It is occasionally seen by others on previous gcc versions as well:

https://lkml.org/lkml/2021/7/11/261
https://lkml.org/lkml/2018/10/24/403

'__popcountdi2' are inserted by gcc for code like the following
from 'drivers/gpu/drm/amd/display/modules/hdcp/hdcp1_execution.c':

static inline enum mod_hdcp_status validate_bksv(struct mod_hdcp *hdcp)
{
uint64_t n = 0;
uint8_t count = 0;
u8 bksv[sizeof(n)] = { };

memcpy(bksv, hdcp->auth.msg.hdcp1.bksv, 
sizeof(hdcp->auth.msg.hdcp1.bksv));
n = *(uint64_t *)bksv;

/* Here gcc inserts 'count = __builtin_popcount(n);' */
while (n) {
count++;
n &= (n - 1);
}

return (count == 20) ? MOD_HDCP_STATUS_SUCCESS :
   MOD_HDCP_STATUS_HDCP1_INVALID_BKSV;
}

Note that gcc can insert it regardless of -mno-* options.

How should linux.git handle it? A few options come to mind:

- Perhaps use libgcc.a directly.
- Just implement '__popcountdi2'. Example definition from libgcc:
  https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/libgcc2.c;hb=HEAD#l846
- Or workaround it with -fno-builtin-popcount in Makefiles.

CCing people who can help routing it and/or deciding on the fix:
amd-gfx@lists.freedesktop.org, Joe Perches, linux-kbu...@vger.kernel.org,
Jakub Jelinek, Segher Boessenkool, Thomas Gleixner,Peter Zijlstra, Andy
Lutomirski.

WDYT?

Thanks!

-- 

  Sergei

Re: gcc inserts __builtin_popcount, causes 'modpost: "__popcountdi2" ... amdgpu.ko] undefined'

2022-04-11 Thread Jakub Jelinek

On Mon, Apr 11, 2022 at 10:08:15PM +0100, Sergei Trofimovich wrote:
> Current linux-5.17.1 on fresh gcc-12 fails to build with errors like:
> 
> ERROR: modpost: "__popcountdi2" 
> [drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko] undefined!
> ERROR: modpost: "__popcountdi2" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] 
> undefined!
> 
> It is occasionally seen by others on previous gcc versions as well:
> 
> https://lkml.org/lkml/2021/7/11/261
> https://lkml.org/lkml/2018/10/24/403
> 
> '__popcountdi2' are inserted by gcc for code like the following
> from 'drivers/gpu/drm/amd/display/modules/hdcp/hdcp1_execution.c':
> 
> static inline enum mod_hdcp_status validate_bksv(struct mod_hdcp *hdcp)
> {
> uint64_t n = 0;
> uint8_t count = 0;
> u8 bksv[sizeof(n)] = { };
> 
> memcpy(bksv, hdcp->auth.msg.hdcp1.bksv, 
> sizeof(hdcp->auth.msg.hdcp1.bksv));
> n = *(uint64_t *)bksv;
> 
> /* Here gcc inserts 'count = __builtin_popcount(n);' */
> while (n) {
> count++;
> n &= (n - 1);
> }
> 
> return (count == 20) ? MOD_HDCP_STATUS_SUCCESS :
>MOD_HDCP_STATUS_HDCP1_INVALID_BKSV;
> }
> 
> Note that gcc can insert it regardless of -mno-* options.

Just FYI, this has been added in GCC 9 for https://gcc.gnu.org/PR82479
If the kernel implements its own __popcount?i2, it can perhaps with
runtime patching use hw instructions when those are available and
fallback to the generic version.

Jakub

Vega 56 failing to process EDID from VR Headset

2022-04-11 Thread James Dutton

Hi,

I have an Oculus Rift S, that I am trying to get working in Linux.
I have an AMD Vega 56  graphics card.

The VR headset plugs into a display port of the Vega56.
The amdgpu driver sees the connection, and tries to process it.
The problem is it cannot process the EDID, so fails to recognise the
VR headset, and the VR headset does not work as a result.
Please find the EDID below.
I am guessing that the following is causing the problem:
Established Timings I & II: none
Standard Timings: none
Forcing the driver to understand the Detailed mode, to which it is failing.
If it helps, when attached to windows 10, it uses 1440x2560, portrait mode.
Some dmesg lines that may be useful:
// We should pick 1440x2560 as Windows picks that, but for some reason
is rejects it with error 10.
[10402.650734] [drm:create_validate_stream_for_sink [amdgpu]] Mode
1440x2560 (clk 571570) failed DC validation with error 10 (No DP link
bandwidth)
[10402.650991] [drm:update_stream_scaling_settings [amdgpu]]
Destination Rectangle x:0  y:0  width:1440  height:2560
[10402.651225] [drm:create_validate_stream_for_sink [amdgpu]] Mode
1440x2560 (clk 571570) failed DC validation with error 10 (No DP link
bandwidth)


Can anyone help give me with some pointers as to how to get the amdgpu
driver to accept this EDID?

edid-decode (hex):

00 ff ff ff ff ff ff 00 3e d2 12 00 00 00 00 00
01 1d 01 04 a5 00 00 00 0e 00 00 00 00 00 00 00
00 00 00 00 00 00 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 45 df a0 38 51 00 ee a5 d2 0c
f1 0c 00 00 00 00 00 10 00 00 00 ff 00 31 57 4d
47 48 35 32 30 47 41 39 35 32 00 00 00 fc 00 52
69 66 74 20 53 0a 20 20 20 20 20 20 00 00 00 10
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 33



EDID version: 1.4
Manufacturer: OVR Model 18 Serial Number 0
Made in week 1 of 2019
Digital display
8 bits per primary color channel
DisplayPort interface
Image size is variable
Gamma: 1.00
Supported color formats: RGB 4:4:4, YCrCb 4:4:4
Default (sRGB) color space is primary color space
First detailed timing includes the native pixel format and preferred
refresh rate
Color Characteristics
  Red:   0., 0.
  Green: 0., 0.
  Blue:  0., 0.
  White: 0., 0.
Established Timings I & II: none
Standard Timings: none
Detailed mode: Clock 571.570 MHz, digital composite
   1440 1650 1662 1752 (210  12  90)
   2560 2623 2624 4078 ( 63   1 1454)
   -hsync
   VertFreq: 80.000 Hz, HorFreq: 326.239 kHz
Display Product Serial Number: 1WMGH520GA952
Display Product Name: Rift S
Dummy Descriptor
Checksum: 0x33

[igt-dev] [PATCH i-g-t v2] tests/drm_buddy: Add drm buddy test cases

2022-04-11 Thread Arunpravin Paneer Selvam

Add a set of drm buddy test cases to validate the
drm/drm_buddy.c memory allocator.

v2: sorted in alphabetical order

Signed-off-by: Arunpravin Paneer Selvam 
Reviewed-by: Matthew Auld 
---
 tests/drm_buddy.c | 14 ++
 tests/meson.build |  1 +
 2 files changed, 15 insertions(+)
 create mode 100644 tests/drm_buddy.c

diff --git a/tests/drm_buddy.c b/tests/drm_buddy.c
new file mode 100644
index ..06876e0c
--- /dev/null
+++ b/tests/drm_buddy.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "igt.h"
+#include "igt_kmod.h"
+
+IGT_TEST_DESCRIPTION("Basic sanity check of DRM's buddy allocator (struct 
drm_buddy)");
+
+igt_main
+{
+   igt_kselftests("test-drm_buddy", NULL, NULL, NULL);
+}
diff --git a/tests/meson.build b/tests/meson.build
index b0eab3d6..7261e9aa 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -8,6 +8,7 @@ test_progs = [
'debugfs_test',
'dmabuf',
'device_reset',
+   'drm_buddy',
'drm_import_export',
'drm_mm',
'drm_read',
-- 
2.25.1

Re: AMDGPU: regression on 5.17.1

2022-04-11 Thread Alex Deucher

On Sat, Apr 9, 2022 at 12:28 PM Michele Ballabio  wrote:
>
> On Tue, 5 Apr 2022 10:23:16 -0400
> Alex Deucher  wrote:
>
> > On Mon, Apr 4, 2022 at 3:39 PM Michele Ballabio
> >  wrote:
> > >
> > > On Mon, 4 Apr 2022 13:03:41 -0400
> > > Alex Deucher  wrote:
> > >
> > > > On Sun, Apr 3, 2022 at 10:19 AM Michele Ballabio
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > > I've hit a regression on 5.17.1 (haven't tested 5.17.0,
> > > > > but 5.16-stable didn't have this problem).
> > > > >
> > > > > The machine is a Ryzen 5 1600 with AMD graphics (RX 560).
> > > > >
> > > > > The regression I hit seems to trigger when the machine is left
> > > > > idle at boot (I don't boot straight to X, I boot to a tty, login
> > > > > and then start X). The machine after a while blanks the screen.
> > > > > Usually, the screen unblanks as the keyboard is hit or the mouse
> > > > > moves, but with kernel 5.17.1 the screen does not wake up. The
> > > > > machine seems to run mostly fine: I can login from ssh, but I
> > > > > cannot reboot or halt it: a sysrq sequence is needed for that.
> > > > > Note that if the screen goes blank under X, it wakes up fine.
> > > > >
> > > > > Below a dmesg and two traces from syslog (they're quite
> > > > > similar).
> > > >
> > > > Can you bisect?  Does setting amdgpu.runpm=0 help?
> > >
> > > I can try to bisect, should I narrow the search to drivers/gpu/drm/
> > > ?
> >
> > I would just do a full bisect if possible in case the change happens
> > to be outside of drm.
> >
> > >
> > > Setting amdgpu.runpm=0 works, the display now unblanks without
> > > problems.
> >
>
> Hi,
> I bisected this, and the first bad commit is
> [087451f372bf76d971184caa258807b7c35aac8f] drm/amdgpu: use generic fb
> helpers instead of setting up AMD own's.
>
> Let me know if you need some more testing.

Thanks.  Do the attached patches fix the issue?

Thanks,

Alex
From be5c2680ed4b3b4121af4f692c905d88d7b6cc00 Mon Sep 17 00:00:00 2001
From: Alex Deucher 
Date: Tue, 28 Dec 2021 17:26:24 -0500
Subject: [PATCH 1/2] drm/amdgpu: don't runtime suspend if there are displays
 attached

We normally runtime suspend when there are displays attached if they
are in the DPMS off state, however, if something wakes the GPU
we send a hotplug event on resume (in case any displays were connected
while the GPU was in suspend) which can cause userspace to light
up the displays again soon after they were turned off.

Prior to
commit 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's."),
the driver took a runtime pm reference when the fbdev emulation was
enabled because we didn't implement proper shadowing support for
vram access when the device was off so the device never runtime
suspended when there was a console bound.  Once that commit landed,
we now utilize the core fb helper implementation which properly
handles the emulation, so runtime pm now suspends in cases where it did
not before.  Ultimately, we need to sort out why runtime suspend in not
working in this case for some users, but this should restore similar
behavior to before.

Fixes: 087451f372bf76 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.")
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 4efaa183abcd..cf795a9b8aef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2508,6 +2508,8 @@ static int amdgpu_pmops_runtime_idle(struct device *dev)
 {
 	struct drm_device *drm_dev = dev_get_drvdata(dev);
 	struct amdgpu_device *adev = drm_to_adev(drm_dev);
+	struct drm_connector *list_connector;
+	struct drm_connector_list_iter iter;
 	/* we don't want the main rpm_idle to call suspend - we want to autosuspend */
 	int ret = 1;
 
@@ -2516,6 +2518,22 @@ static int amdgpu_pmops_runtime_idle(struct device *dev)
 		return -EBUSY;
 	}
 
+	/* XXX: Return busy if any displays are connected to avoid
+	 * possible display wake ups after runtime resume due to
+	 * hotplug events in case any displays were connected while
+	 * the GPU was in suspend.  Remove this once that is fixed.
+	 */
+	mutex_lock(&drm_dev->mode_config.mutex);
+	drm_connector_list_iter_begin(drm_dev, &iter);
+	drm_for_each_connector_iter(list_connector, &iter) {
+		if (list_connector->status == connector_status_connected) {
+			ret = -EBUSY;
+			break;
+		}
+	}
+	drm_connector_list_iter_end(&iter);
+	mutex_unlock(&drm_dev->mode_config.mutex);
+
 	if (amdgpu_device_has_dc_support(adev)) {
 		struct drm_crtc *crtc;
 
@@ -2527,11 +2545,7 @@ static int amdgpu_pmops_runtime_idle(struct device *dev)
 			if (ret < 0)
 break;
 		}
-
 	} else {
-		struct drm_connector *list_connector;
-		struct drm_connector_list_iter iter;
-
 		mutex_lock(&drm_dev->mode_config.mutex);
 		drm_modeset_lock(&drm_dev->mode_conf

[linux-next:master] BUILD REGRESSION d12d7e1cfe38e0c36d28c7a9fbbc436ad0d17c14

2022-04-11 Thread kernel test robot

tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: d12d7e1cfe38e0c36d28c7a9fbbc436ad0d17c14  Add linux-next specific 
files for 20220411

Error/Warning reports:

https://lore.kernel.org/linux-mm/202203160358.yulpl6b4-...@intel.com
https://lore.kernel.org/linux-mm/202204081656.6x4pfen4-...@intel.com
https://lore.kernel.org/llvm/202203241958.uw9bwfmd-...@intel.com
https://lore.kernel.org/llvm/202204030606.gml0qvdb-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

ERROR: dtschema minimum version is v2022.3
drivers/bus/mhi/host/main.c:792:13: warning: parameter 'event_quota' set but 
not used [-Wunused-but-set-parameter]
drivers/gpu/drm/amd/amdgpu/../display/dc/virtual/virtual_link_hwss.c:32:6: 
warning: no previous prototype for 'virtual_setup_stream_attribute' 
[-Wmissing-prototypes]
powerpc-linux-ld: fs/btrfs/raid56.c:970: undefined reference to `__udivdi3'

Unverified Error/Warning (likely false positive, please contact us if 
interested):

ERROR: modpost: "__aeabi_uldivmod" [fs/btrfs/btrfs.ko] undefined!
ERROR: modpost: "__hexagon_udivdi3" [fs/btrfs/btrfs.ko] undefined!
ERROR: modpost: "__udivdi3" [fs/btrfs/btrfs.ko] undefined!
Makefile:684: arch/h8300/Makefile: No such file or directory
arch/Kconfig:10: can't open file "arch/h8300/Kconfig"
arch/s390/include/asm/spinlock.h:81:3: error: unexpected token in '.rept' 
directive
arch/s390/include/asm/spinlock.h:81:3: error: unknown directive
arch/s390/include/asm/spinlock.h:81:3: error: unmatched '.endr' directive
arch/s390/lib/spinlock.c:78:3: error: unexpected token in '.rept' directive
arch/s390/lib/spinlock.c:78:3: error: unknown directive
arch/s390/lib/spinlock.c:78:3: error: unmatched '.endr' directive
drivers/dma-buf/st-dma-fence-unwrap.c:125:13: warning: variable 'err' set but 
not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn31/dcn31_hubp.c:57:6: warning: no 
previous prototype for 'hubp31_program_extended_blank' [-Wmissing-prototypes]
drivers/hwmon/da9055-hwmon.c:201:9: warning: Call to function 'sprintf' is 
insecure as it does not provide bounding of the memory buffer or security 
checks introduced in the C11 standard. Replace with analogous functions that 
support length arguments or provides boundary checks such as 'sprintf_s' in 
case of C11 
[clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
drivers/hwmon/vt8231.c:634:9: warning: Call to function 'sprintf' is insecure 
as it does not provide bounding of the memory buffer or security checks 
introduced in the C11 standard. Replace with analogous functions that support 
length arguments or provides boundary checks such as 'sprintf_s' in case of C11 
[clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
drivers/misc/cb710/debug.c:96:1: warning: Call to function 'sprintf' is 
insecure as it does not provide bounding of the memory buffer or security 
checks introduced in the C11 standard. Replace with analogous functions that 
support length arguments or provides boundary checks such as 'sprintf_s' in 
case of C11 
[clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
drivers/misc/mei/main.c:1100:9: warning: Call to function 'sprintf' is insecure 
as it does not provide bounding of the memory buffer or security checks 
introduced in the C11 standard. Replace with analogous functions that support 
length arguments or provides boundary checks such as 'sprintf_s' in case of C11 
[clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
drivers/pci/vgaarb.c:213:17: warning: Value stored to 'dev' during its 
initialization is never read [clang-analyzer-deadcode.DeadStores]
drivers/staging/rtl8723bs/os_dep/ioctl_linux.c:589:29: warning: array subscript 
'struct ndis_802_11_wep[0]' is partly outside array bounds of 'unsigned 
char[25]' [-Warray-bounds]
drivers/tty/synclink_gt.c:3430:2: warning: Call to function 'sprintf' is 
insecure as it does not provide bounding of the memory buffer or security 
checks introduced in the C11 standard. Replace with analogous functions that 
support length arguments or provides boundary checks such as 'sprintf_s' in 
case of C11 
[clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
kernel/module/main.c:2189:4: warning: Null pointer passed as 1st argument to 
memory copy function [clang-analyzer-unix.cstring.NullArg]
kernel/module/main.c:924:9: warning: Call to function 'sprintf' is insecure as 
it does not provide bounding of the memory buffer or security checks introduced 
in the C11 standard. Replace with analogous functions that support length 
arguments or provides boundary checks such as 'spr

Re: [PATCH] drm/radeon: remove r600_blit_shaders.[c|h]

2022-04-11 Thread Alex Deucher

On Sat, Apr 9, 2022 at 1:11 PM Tom Rix  wrote:
>
> The only use of the global variables in r600_blit_shaders.c
> were in the old drivers/gpu/drm/radeon/r600_blit.c
> This file was removed in
> commit 8333f607a631 ("drm/radeon: remove UMS support")
>
> So remove the r600_blit_shaders.[c|h] files
>
> Signed-off-by: Tom Rix 

Applied.  Thanks!

Alex

Patch "drm/amdkfd: Fix -Wstrict-prototypes from amdgpu_amdkfd_gfx_10_0_get_functions()" has been added to the 5.4-stable tree

2022-04-11 Thread gregkh



This is a note to let you know that I've just added the patch titled

drm/amdkfd: Fix -Wstrict-prototypes from 
amdgpu_amdkfd_gfx_10_0_get_functions()

to the 5.4-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 
drm-amdkfd-fix-wstrict-prototypes-from-amdgpu_amdkfd_gfx_10_0_get_functions.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


>From foo@baz Mon Apr 11 06:48:58 PM CEST 2022
From: Nathan Chancellor 
Date: Mon, 11 Apr 2022 09:43:08 -0700
Subject: drm/amdkfd: Fix -Wstrict-prototypes from 
amdgpu_amdkfd_gfx_10_0_get_functions()
To: Greg Kroah-Hartman , Sasha Levin 

Cc: Felix Kuehling , Alex Deucher 
, Nick Desaulniers , 
amd-gfx@lists.freedesktop.org, l...@lists.linux.dev, sta...@vger.kernel.org, 
Nathan Chancellor 
Message-ID: <20220411164308.2491139-3-nat...@kernel.org>

From: Nathan Chancellor 

This patch is for linux-5.4.y only, it has no equivalent change
upstream.

When building x86_64 allmodconfig with tip of tree clang, there is an
instance of -Wstrict-prototypes:

  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c:168:59: error: a function 
declaration without a prototype is deprecated in all versions of C 
[-Werror,-Wstrict-prototypes]
  struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions()
^
 void
  1 error generated.

amdgpu_amdkfd_gfx_10_0_get_functions() is prototyped properly in
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h but its definition in
amdgpu_amdkfd_gfx_v10.c does not have the argument types specified,
which causes the warning. GCC does not warn because it permits an
old-style definition if the prototype has the argument types.

This code was eliminated by commit e392c887df97 ("drm/amdkfd: Use array
to probe kfd2kgd_calls"), which was a part of a larger series that does
not look very suitable for stable. Just fix this one location, as it was
the only instance of this new warning across a variety of builds.

Fixes: 6bdadb207224 ("drm/amdgpu: Add navi10 kfd support for amdgpu (v3)")
Signed-off-by: Nathan Chancellor 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -165,7 +165,7 @@ static const struct kfd2kgd_calls kfd2kg
.get_tile_config = amdgpu_amdkfd_get_tile_config,
 };
 
-struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions()
+struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions(void)
 {
return (struct kfd2kgd_calls *)&kfd2kgd;
 }


Patches currently in stable-queue which might be from nat...@kernel.org are

queue-5.4/drm-amdkfd-fix-wstrict-prototypes-from-amdgpu_amdkfd_gfx_10_0_get_functions.patch
queue-5.4/drm-amdkfd-add-missing-void-argument-to-function-kgd2kfd_init.patch
queue-5.4/arm64-module-remove-noload-from-linker-script.patch
queue-5.4/tools-build-filter-out-options-and-warnings-not-supported-by-clang.patch
queue-5.4/riscv-module-remove-noload.patch
queue-5.4/tools-build-use-shell-instead-of-to-get-embedded-libperl-s-ccopts.patch

Patch "drm/amdkfd: add missing void argument to function kgd2kfd_init" has been added to the 5.4-stable tree

2022-04-11 Thread gregkh



This is a note to let you know that I've just added the patch titled

drm/amdkfd: add missing void argument to function kgd2kfd_init

to the 5.4-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 drm-amdkfd-add-missing-void-argument-to-function-kgd2kfd_init.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


>From foo@baz Mon Apr 11 06:48:58 PM CEST 2022
From: Nathan Chancellor 
Date: Mon, 11 Apr 2022 09:43:07 -0700
Subject: drm/amdkfd: add missing void argument to function kgd2kfd_init
To: Greg Kroah-Hartman , Sasha Levin 

Cc: Felix Kuehling , Alex Deucher 
, Nick Desaulniers , 
amd-gfx@lists.freedesktop.org, l...@lists.linux.dev, sta...@vger.kernel.org, 
Colin Ian King , Randy Dunlap 
, Nathan Chancellor 
Message-ID: <20220411164308.2491139-2-nat...@kernel.org>

From: Nathan Chancellor 

From: Colin Ian King 

commit 63617d8b125ed9f674133dd000b6df58d6b2965a upstream.

Function kgd2kfd_init is missing a void argument, add it
to clean up the non-ANSI function declaration.

Acked-by: Randy Dunlap 
Signed-off-by: Colin Ian King 
Signed-off-by: Alex Deucher 
Signed-off-by: Nathan Chancellor 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/amd/amdkfd/kfd_module.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -82,7 +82,7 @@ static void kfd_exit(void)
kfd_chardev_exit();
 }
 
-int kgd2kfd_init()
+int kgd2kfd_init(void)
 {
return kfd_init();
 }


Patches currently in stable-queue which might be from nat...@kernel.org are

queue-5.4/drm-amdkfd-fix-wstrict-prototypes-from-amdgpu_amdkfd_gfx_10_0_get_functions.patch
queue-5.4/drm-amdkfd-add-missing-void-argument-to-function-kgd2kfd_init.patch
queue-5.4/arm64-module-remove-noload-from-linker-script.patch
queue-5.4/tools-build-filter-out-options-and-warnings-not-supported-by-clang.patch
queue-5.4/riscv-module-remove-noload.patch
queue-5.4/tools-build-use-shell-instead-of-to-get-embedded-libperl-s-ccopts.patch

Re: [PATCH 5.4 0/2] Fix two instances of -Wstrict-prototypes in drm/amd

2022-04-11 Thread Greg Kroah-Hartman

On Mon, Apr 11, 2022 at 09:43:06AM -0700, Nathan Chancellor wrote:
> Hi everyone,
> 
> These two patches resolve two instances of -Wstrict-prototypes with
> newer versions of clang that are present in 5.4. The main Makefile makes
> this a hard error.
> 
> The first patch is upstream commit 63617d8b125e ("drm/amdkfd: add
> missing void argument to function kgd2kfd_init"), which showed up in
> 5.5.
> 
> The second patch has no upstream equivalent, as the code in question was
> removed in commit e392c887df97 ("drm/amdkfd: Use array to probe
> kfd2kgd_calls") upstream, which is part of a larger series that did not
> look reasonable for stable. I opted to just fix the warning in the same
> manner as the prior patch, which is less risky and accomplishes the same
> end result of no warning.
> 
> Colin Ian King (1):
>   drm/amdkfd: add missing void argument to function kgd2kfd_init
> 
> Nathan Chancellor (1):
>   drm/amdkfd: Fix -Wstrict-prototypes from
> amdgpu_amdkfd_gfx_10_0_get_functions()
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> 
> base-commit: 2845ff3fd34499603249676495c524a35e795b45
> -- 
> 2.35.1
> 

Now queued up, thanks.

greg k-h

RE: [PATCH 5.4 0/2] Fix two instances of -Wstrict-prototypes in drm/amd

2022-04-11 Thread Deucher, Alexander

[Public]

> -Original Message-
> From: Nathan Chancellor 
> Sent: Monday, April 11, 2022 12:43 PM
> To: Greg Kroah-Hartman ; Sasha Levin
> 
> Cc: Kuehling, Felix ; Deucher, Alexander
> ; Nick Desaulniers
> ; amd-gfx@lists.freedesktop.org;
> l...@lists.linux.dev; sta...@vger.kernel.org; Nathan Chancellor
> 
> Subject: [PATCH 5.4 0/2] Fix two instances of -Wstrict-prototypes in drm/amd
> 
> Hi everyone,
> 
> These two patches resolve two instances of -Wstrict-prototypes with newer
> versions of clang that are present in 5.4. The main Makefile makes this a hard
> error.
> 
> The first patch is upstream commit 63617d8b125e ("drm/amdkfd: add missing
> void argument to function kgd2kfd_init"), which showed up in 5.5.
> 
> The second patch has no upstream equivalent, as the code in question was
> removed in commit e392c887df97 ("drm/amdkfd: Use array to probe
> kfd2kgd_calls") upstream, which is part of a larger series that did not look
> reasonable for stable. I opted to just fix the warning in the same manner as
> the prior patch, which is less risky and accomplishes the same end result of 
> no
> warning.
> 
> Colin Ian King (1):
>   drm/amdkfd: add missing void argument to function kgd2kfd_init
> 
> Nathan Chancellor (1):
>   drm/amdkfd: Fix -Wstrict-prototypes from
> amdgpu_amdkfd_gfx_10_0_get_functions()

Series is:
Acked-by: Alex Deucher 

> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> 
> base-commit: 2845ff3fd34499603249676495c524a35e795b45
> --
> 2.35.1

[PATCH 5.4 2/2] drm/amdkfd: Fix -Wstrict-prototypes from amdgpu_amdkfd_gfx_10_0_get_functions()

2022-04-11 Thread Nathan Chancellor

This patch is for linux-5.4.y only, it has no equivalent change
upstream.

When building x86_64 allmodconfig with tip of tree clang, there is an
instance of -Wstrict-prototypes:

  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c:168:59: error: a function 
declaration without a prototype is deprecated in all versions of C 
[-Werror,-Wstrict-prototypes]
  struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions()
^
 void
  1 error generated.

amdgpu_amdkfd_gfx_10_0_get_functions() is prototyped properly in
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h but its definition in
amdgpu_amdkfd_gfx_v10.c does not have the argument types specified,
which causes the warning. GCC does not warn because it permits an
old-style definition if the prototype has the argument types.

This code was eliminated by commit e392c887df97 ("drm/amdkfd: Use array
to probe kfd2kgd_calls"), which was a part of a larger series that does
not look very suitable for stable. Just fix this one location, as it was
the only instance of this new warning across a variety of builds.

Fixes: 6bdadb207224 ("drm/amdgpu: Add navi10 kfd support for amdgpu (v3)")
Signed-off-by: Nathan Chancellor 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index ce30d4e8bf25..f7c4337c1ffe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -165,7 +165,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
.get_tile_config = amdgpu_amdkfd_get_tile_config,
 };
 
-struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions()
+struct kfd2kgd_calls *amdgpu_amdkfd_gfx_10_0_get_functions(void)
 {
return (struct kfd2kgd_calls *)&kfd2kgd;
 }
-- 
2.35.1

[PATCH 5.4 1/2] drm/amdkfd: add missing void argument to function kgd2kfd_init

2022-04-11 Thread Nathan Chancellor

From: Colin Ian King 

commit 63617d8b125ed9f674133dd000b6df58d6b2965a upstream.

Function kgd2kfd_init is missing a void argument, add it
to clean up the non-ANSI function declaration.

Acked-by: Randy Dunlap 
Signed-off-by: Colin Ian King 
Signed-off-by: Alex Deucher 
Signed-off-by: Nathan Chancellor 
---
 drivers/gpu/drm/amd/amdkfd/kfd_module.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 986ff52d5750..f4b7f7e6c40e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -82,7 +82,7 @@ static void kfd_exit(void)
kfd_chardev_exit();
 }
 
-int kgd2kfd_init()
+int kgd2kfd_init(void)
 {
return kfd_init();
 }
-- 
2.35.1

[PATCH 5.4 0/2] Fix two instances of -Wstrict-prototypes in drm/amd

2022-04-11 Thread Nathan Chancellor

Hi everyone,

These two patches resolve two instances of -Wstrict-prototypes with
newer versions of clang that are present in 5.4. The main Makefile makes
this a hard error.

The first patch is upstream commit 63617d8b125e ("drm/amdkfd: add
missing void argument to function kgd2kfd_init"), which showed up in
5.5.

The second patch has no upstream equivalent, as the code in question was
removed in commit e392c887df97 ("drm/amdkfd: Use array to probe
kfd2kgd_calls") upstream, which is part of a larger series that did not
look reasonable for stable. I opted to just fix the warning in the same
manner as the prior patch, which is less risky and accomplishes the same
end result of no warning.

Colin Ian King (1):
  drm/amdkfd: add missing void argument to function kgd2kfd_init

Nathan Chancellor (1):
  drm/amdkfd: Fix -Wstrict-prototypes from
amdgpu_amdkfd_gfx_10_0_get_functions()

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_module.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


base-commit: 2845ff3fd34499603249676495c524a35e795b45
-- 
2.35.1

Re: [PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10

2022-04-11 Thread Alex Deucher

On Mon, Apr 11, 2022 at 11:28 AM Paul Menzel  wrote:
>
> Dear Yongqiang,
>
>
> Thank you for your patch.
>
> Am 11.04.22 um 15:59 schrieb Yongqiang Sun:
> > MS_HYPERV with vega10 doesn't have the interface to process
> > request init data msg.
>
> Should some Hyper-V folks be added to the reviewers list too?
>
> > Check hypervisor type to not send the request for MS_HYPERV.
>
> Please add a blank line between paragraphs.
>
> > Signed-off-by: Yongqiang Sun 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12 ++--
> >   1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> > index 933c41f77c92..56b130ec44a9 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> > @@ -23,6 +23,10 @@
> >
> >   #include 
> >
> > +#ifdef CONFIG_X86
> > +#include 
> > +#endif
> > +
> >   #include 
> >
> >   #include "amdgpu.h"
> > @@ -721,8 +725,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device 
> > *adev)
> >   break;
> >   case CHIP_VEGA10:
> >   soc15_set_virt_ops(adev);
> > - /* send a dummy GPU_INIT_DATA request to host on 
> > vega10 */
> > - amdgpu_virt_request_init_data(adev);
> > +#ifdef CONFIG_X86
> > + /* not send GPU_INIT_DATA with MS_HYPERV*/
> > + if (hypervisor_is_type(X86_HYPER_MS_HYPERV) == false)
> > +#endif
>
> Why guard everything with CONFIG_X86? (If it’s needed, it should be done
> in C code.)

X86_HYPER_MS_HYPERV only available on x86.

Alex

>
> > + /* send a dummy GPU_INIT_DATA request to host 
> > on vega10 */
> > + amdgpu_virt_request_init_data(adev);
> >   break;
> >   case CHIP_VEGA20:
> >   case CHIP_ARCTURUS:
>
>
> Kind regards,
>
> Paul

Re: [EXTERNAL] [PATCH 2/2] drm/amdkfd: Add PCIe Hotplug Support for AMDKFD

2022-04-11 Thread Andrey Grodzovsky




On 2022-04-08 21:28, Shuotao Xu wrote:



On Apr 8, 2022, at 11:28 PM, Andrey Grodzovsky  
wrote:

[Some people who received this message don't often get email from 
andrey.grodzov...@amd.com. Learn why this is important at 
http://aka.ms/LearnAboutSenderIdentification.]

On 2022-04-08 04:45, Shuotao Xu wrote:

Adding PCIe Hotplug Support for AMDKFD: the support of hot-plug of GPU
devices can open doors for many advanced applications in data center
in the next few years, such as for GPU resource
disaggregation. Current AMDKFD does not support hotplug out b/o the
following reasons:

1. During PCIe removal, decrement KFD lock which was incremented at
the beginning of hw fini; otherwise kfd_open later is going to
fail.

I assumed you read my comment last time, still you do same approach.
More in details bellow

Aha, I like your fix:) I was not familiar with drm APIs so just only half 
understood your comment last time.

BTW, I tried hot-plugging out a GPU when rocm application is still running.
 From dmesg, application is still trying to access the removed kfd device, and 
are met with some errors.



Application us supposed to keep running, it holds the drm_device 
reference as long as it has an open
FD to the device and final cleanup will come only after the app will die 
thus releasing the FD and the last

drm_device reference.


Application would hang and not exiting in this case.



For graphic apps what i usually see is a crash because of sigsev when 
the app tries to access
an unmapped MMIO region on the device. I haven't tested for compute 
stack and so there might
be something I haven't covered. Hang could mean for example waiting on a 
fence which is not being

signaled - please provide full dmesg from this case.



Do you have any good suggestions on how to fix it down the line? (HIP 
runtime/libhsakmt or driver)

[64036.631333] amdgpu: amdgpu_vm_bo_update failed
[64036.631702] amdgpu: validate_invalid_user_pages: update PTE failed
[64036.640754] amdgpu: amdgpu_vm_bo_update failed
[64036.641120] amdgpu: validate_invalid_user_pages: update PTE failed
[64036.650394] amdgpu: amdgpu_vm_bo_update failed
[64036.650765] amdgpu: validate_invalid_user_pages: update PTE failed



This just probably means trying to update PTEs after the physical device 
is gone - we usually avoid this by
first trying to do all HW shutdowns early before PCI remove completion 
but when it's really tricky by

protecting HW access sections with drm_dev_enter/exit scope.

For this particular error it would be the best to flush 
info->restore_userptr_work before the end of
amdgpu_pci_remove (rejecting  new process creation and calling 
cancel_delayed_work_sync(&process_info->restore_userptr_work) for all 
running processes)

somewhere in amdgpu_pci_remove.

Andrey




Really appreciate your help!

Best,
Shuotao
  

2. Remove redudant p2p/io links in sysfs when device is hotplugged
out.

3. New kfd node_id is not properly assigned after a new device is
added after a gpu is hotplugged out in a system. libhsakmt will
find this anomaly, (i.e. node_from !=  in iolinks),
when taking a topology_snapshot, thus returns fault to the rocm
stack.

-- This patch fixes issue 1; another patch by Mukul fixes issues 2&3.
-- Tested on a 4-GPU MI100 gpu nodes with kernel 5.13.0-kfd; kernel
5.16.0-kfd is unstable out of box for MI100.
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  5 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  7 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
  drivers/gpu/drm/amd/amdkfd/kfd_device.c| 13 +
  4 files changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index c18c4be1e4ac..d50011bdb5c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -213,6 +213,11 @@ int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool 
run_pm)
  return r;
  }

+int amdgpu_amdkfd_resume_processes(void)
+{
+ return kgd2kfd_resume_processes();
+}
+
  int amdgpu_amdkfd_pre_reset(struct amdgpu_device *adev)
  {
  int r = 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index f8b9f27adcf5..803306e011c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -140,6 +140,7 @@ void amdgpu_amdkfd_fini(void);
  void amdgpu_amdkfd_suspend(struct amdgpu_device *adev, bool run_pm);
  int amdgpu_amdkfd_resume_iommu(struct amdgpu_device *adev);
  int amdgpu_amdkfd_resume(struct amdgpu_device *adev, bool run_pm);
+int amdgpu_amdkfd_resume_processes(void);
  void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
  const void *ih_ring_entry);
  void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev);
@@ -347,6 +348,7 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd);
  void kgd2kfd_suspend(struct kfd_dev *kfd, bool

Re: [PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10

2022-04-11 Thread Paul Menzel


Dear Yongqiang,


Thank you for your patch.

Am 11.04.22 um 15:59 schrieb Yongqiang Sun:

MS_HYPERV with vega10 doesn't have the interface to process
request init data msg.


Should some Hyper-V folks be added to the reviewers list too?


Check hypervisor type to not send the request for MS_HYPERV.


Please add a blank line between paragraphs.


Signed-off-by: Yongqiang Sun 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 933c41f77c92..56b130ec44a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -23,6 +23,10 @@
  
  #include 
  
+#ifdef CONFIG_X86

+#include 
+#endif
+
  #include 
  
  #include "amdgpu.h"

@@ -721,8 +725,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device 
*adev)
break;
case CHIP_VEGA10:
soc15_set_virt_ops(adev);
-   /* send a dummy GPU_INIT_DATA request to host on vega10 
*/
-   amdgpu_virt_request_init_data(adev);
+#ifdef CONFIG_X86
+   /* not send GPU_INIT_DATA with MS_HYPERV*/
+   if (hypervisor_is_type(X86_HYPER_MS_HYPERV) == false)
+#endif


Why guard everything with CONFIG_X86? (If it’s needed, it should be done 
in C code.)



+   /* send a dummy GPU_INIT_DATA request to host 
on vega10 */
+   amdgpu_virt_request_init_data(adev);
break;
case CHIP_VEGA20:
case CHIP_ARCTURUS:



Kind regards,

Paul

Re: [PATCH v2 ] drm/amdgpu: fix discovery ip failed

2022-04-11 Thread Alex Deucher

On Mon, Apr 11, 2022 at 11:25 AM Zhang, Jesse(Jie)  wrote:
>
> [AMD Official Use Only]
>
>
>
> + Deucher, Alexander
>
>
>
> From: Zhang, Jesse(Jie)
> Sent: Monday, 11 April 2022 11:15 pm
> To: amd-gfx@lists.freedesktop.org
> Cc: Huang, Ray ; Zhang, Yifan 
> Subject: [PATCH ] drm/amdgpu: fix discovery ip failed
>
>
>
> [AMD Official Use Only]
>
>
>
> Fix discovery ip failed, and the log:
>
>56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> number_instance (64) from ip discovery blob
> [   56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (45056) from ip discovery blob
> [   56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> number_instance (66) from ip discovery blob
> [   56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (45568) from ip discovery blob
> [   56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> number_instance (66) from ip discovery blob
> [   56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (480) from ip discovery blob
> [   56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (608) from ip discovery blob
> [   56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (640) from ip discovery blob
> [   56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> number_instance (64) from ip discovery blob
> [   56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> number_instance (64) from ip discovery blob
> [   56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
> hw_id (28672) from ip discovery blob
>

Please add a description of what the fix is doing and add your signed-off-by.

Alex


>
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
>
> index 766006a075ec..a778b0392e9f 100644
>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
>
> @@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct 
> amdgpu_device *adev,
>
> res = kobject_add(&ip_hw_instance->kobj, NULL,
>
>   "%d", ip_hw_instance->num_instance);
>
> next_ip:
>
> -   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address 
> - 1);
>
> +   ip_offset += struct_size(ip, base_address, 
> ip->num_base_address);
>
> }
>
> }
>
>
>
> @@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
> *adev)
>
> }
>
>
>
> next_ip:
>
> -   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address 
> - 1);
>
> +   ip_offset += struct_size(ip, base_address, 
> ip->num_base_address);
>
> }
>
> }
>
>
>
> @@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct 
> amdgpu_device *adev, int hw_id, int n
>
> *revision = ip->revision;
>
> return 0;
>
> }
>
> -   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address 
> - 1);
>
> +   ip_offset += struct_size(ip, base_address, 
> ip->num_base_address);
>
> }
>
> }

RE: [PATCH v2 ] drm/amdgpu: fix discovery ip failed

2022-04-11 Thread Zhang, Jesse(Jie)

[AMD Official Use Only]

+ Deucher, Alexander

From: Zhang, Jesse(Jie)
Sent: Monday, 11 April 2022 11:15 pm
To: amd-gfx@lists.freedesktop.org
Cc: Huang, Ray ; Zhang, Yifan 
Subject: [PATCH ] drm/amdgpu: fix discovery ip failed


[AMD Official Use Only]

Fix discovery ip failed, and the log:
   56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (45056) from ip discovery blob
[   56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (66) from ip discovery blob
[   56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (45568) from ip discovery blob
[   56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (66) from ip discovery blob
[   56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (480) from ip discovery blob
[   56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (608) from ip discovery blob
[   56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (640) from ip discovery blob
[   56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (28672) from ip discovery blob

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 766006a075ec..a778b0392e9f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct amdgpu_device 
*adev,
res = kobject_add(&ip_hw_instance->kobj, NULL,
  "%d", ip_hw_instance->num_instance);
next_ip:
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
}
}

@@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev)
}

next_ip:
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
}
}

@@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device 
*adev, int hw_id, int n
*revision = ip->revision;
return 0;
}
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
}
}

Re: [PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10

2022-04-11 Thread Alex Deucher

On Mon, Apr 11, 2022 at 9:52 AM Yongqiang Sun  wrote:
>
> MS_HYPERV with vega10 doesn't have the interface to process
> request init data msg.
> Check hypervisor type to not send the request for MS_HYPERV.
>
> Signed-off-by: Yongqiang Sun 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index 933c41f77c92..56b130ec44a9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -23,6 +23,10 @@
>
>  #include 
>
> +#ifdef CONFIG_X86
> +#include 
> +#endif
> +
>  #include 
>
>  #include "amdgpu.h"
> @@ -721,8 +725,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device 
> *adev)
> break;
> case CHIP_VEGA10:
> soc15_set_virt_ops(adev);
> -   /* send a dummy GPU_INIT_DATA request to host on 
> vega10 */
> -   amdgpu_virt_request_init_data(adev);
> +#ifdef CONFIG_X86
> +   /* not send GPU_INIT_DATA with MS_HYPERV*/
> +   if (hypervisor_is_type(X86_HYPER_MS_HYPERV) == false)

Preferred coding style would be:
if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
With that fixed:
Acked-by: Alex Deucher 

> +#endif
> +   /* send a dummy GPU_INIT_DATA request to host 
> on vega10 */
> +   amdgpu_virt_request_init_data(adev);
> break;
> case CHIP_VEGA20:
> case CHIP_ARCTURUS:
> --
> 2.25.1
>

Re: [PATCH ] drm/amdgpu: fix discovery ip failed

2022-04-11 Thread Paul Menzel


Dear Jie,


Thank you for your patch.


Am 11.04.22 um 17:15 schrieb Zhang, Jesse(Jie):

You might want to add a space before the (.


[AMD Official Use Only]


Please send a patch with `git format-patch` or similar.


Fix discovery ip failed, and the log:


On what system?


56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (45056) from ip discovery blob
[   56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (66) from ip discovery blob
[   56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (45568) from ip discovery blob
[   56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (66) from ip discovery blob
[   56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (480) from ip discovery blob
[   56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (608) from ip discovery blob
[   56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (640) from ip discovery blob
[   56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (28672) from ip discovery blob


Please describe the reason for the failure, and your fix.

A Signed-off-by line is missing.


diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 766006a075ec..a778b0392e9f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct amdgpu_device 
*adev,
 res = kobject_add(&ip_hw_instance->kobj, NULL,
   "%d", ip_hw_instance->num_instance);
next_ip:
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
 }
 }

@@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev)
 }

next_ip:
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
 }
 }

@@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device 
*adev, int hw_id, int n
 *revision = ip->revision;
 return 0;
 }
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
 }
 }



Kind regards,

Paul

Re: [PATCH v2] drm/amd/pm: Disable fan control if not supported

2022-04-11 Thread Paul Menzel


Dear Elena,


Am 11.04.22 um 15:32 schrieb Elena Sakhnovitch:

On Sienna Cichild, not all platforms use PMFW based fan control. On such
ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon
knobs for fan control also as it is not possible to report or control
fan speed on such platforms through driver.
v2: FEATURE_FAN_CONTROL_MASK is replaced with FEATURE_FAN_CONTROL_BIT


Please add a blank line before the v2 line.

Also please give one specific example for a platform without PMFW based 
fan control.



Signed-off-by: Elena Sakhnovitch  
---
  .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index ab3e9d8b831e..ddc388b061b6 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -371,6 +371,18 @@ static void sienna_cichlid_check_bxco_support(struct 
smu_context *smu)
}
  }
  
+static void sienna_cichlid_check_fan_support(struct smu_context *smu)

+{
+   struct smu_table_context *table_context = &smu->smu_table;
+   PPTable_t *pptable = table_context->driver_pptable;
+   /* No sort of fan control possible if PPTable has it disabled */
+   smu->adev->pm.no_fan =
+   !(pptable->FeaturesToRun[1] & (1U << FEATURE_FAN_CONTROL_BIT));
+   if (smu->adev->pm.no_fan)
+   dev_info_once(smu->adev->dev,
+ "PMFW based fan control disabled");


Maybe clarify:

… according to PPTable.


+}
+
  static int sienna_cichlid_check_powerplay_table(struct smu_context *smu)
  {
struct smu_table_context *table_context = &smu->smu_table;
@@ -381,6 +393,7 @@ static int sienna_cichlid_check_powerplay_table(struct 
smu_context *smu)
smu->dc_controlled_by_gpio = true;
  
  	sienna_cichlid_check_bxco_support(smu);

+   sienna_cichlid_check_fan_support(smu);
  
  	table_context->thermal_controller_type =

powerplay_table->thermal_controller_type;
@@ -410,7 +423,7 @@ static int sienna_cichlid_append_powerplay_table(struct 
smu_context *smu)
GET_PPTABLE_MEMBER(I2cControllers, &table_member);
memcpy(table_member, smc_dpm_table->I2cControllers,
sizeof(*smc_dpm_table) - 
sizeof(smc_dpm_table->table_header));
-   
+


Unrelated, and should be a separate patch.


return 0;
  }
  



Kind regards,

Paul

[PATCH ] drm/amdgpu: fix discovery ip failed

2022-04-11 Thread Zhang, Jesse(Jie)

[AMD Official Use Only]

Fix discovery ip failed, and the log:
   56.129549] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.130129] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (45056) from ip discovery blob
[   56.130701] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (66) from ip discovery blob
[   56.131283] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (45568) from ip discovery blob
[   56.131855] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (66) from ip discovery blob
[   56.132436] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (480) from ip discovery blob
[   56.133053] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (608) from ip discovery blob
[   56.133626] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (640) from ip discovery blob
[   56.134207] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.134780] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
number_instance (64) from ip discovery blob
[   56.135360] [drm:amdgpu_discovery_validate_ip [amdgpu]] *ERROR* Unexpected 
hw_id (28672) from ip discovery blob

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 766006a075ec..a778b0392e9f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -798,7 +798,7 @@ static int amdgpu_discovery_sysfs_ips(struct amdgpu_device 
*adev,
res = kobject_add(&ip_hw_instance->kobj, NULL,
  "%d", ip_hw_instance->num_instance);
next_ip:
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
}
}

@@ -1063,7 +1063,7 @@ int amdgpu_discovery_reg_base_init(struct amdgpu_device 
*adev)
}

next_ip:
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
}
}

@@ -1113,7 +1113,7 @@ int amdgpu_discovery_get_ip_version(struct amdgpu_device 
*adev, int hw_id, int n
*revision = ip->revision;
return 0;
}
-   ip_offset += sizeof(*ip) + 4 * (ip->num_base_address - 
1);
+   ip_offset += struct_size(ip, base_address, 
ip->num_base_address);
}
}

回复: [PATCH 1/3] drm/amdgpu: add poison consumption flag for RAS IH

2022-04-11 Thread Yang, Stanley

[AMD Official Use Only]

Hi Tao,

According to the series patches, I have one question, is the ras_ih_flag set 
according to poison mode configuration, if yes, driver will handle poison once 
get ecc_irq interrupt, but at this moment there may no app to consumes it, this 
seems conflict the poison consumption definition.

Regards,
Stanley
> -邮件原件-
> 发件人: Zhou1, Tao 
> 发送时间: Monday, April 11, 2022 7:08 PM
> 收件人: amd-gfx@lists.freedesktop.org; Lazar, Lijo ;
> Ziya, Mohammad zafar ; Zhang, Hawking
> ; Yang, Stanley ; Chai,
> Thomas 
> 抄送: Zhou1, Tao 
> 主题: [PATCH 1/3] drm/amdgpu: add poison consumption flag for RAS IH
>
> So we can distinguish RAS poison consumption interrupt from UE interrupt.
>
> Signed-off-by: Tao Zhou 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> index 606df8869b89..380f4c3020c7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> @@ -314,6 +314,11 @@ enum amdgpu_ras_ret {
>   AMDGPU_RAS_PT,
>  };
>
> +enum amdgpu_ras_ih_flag {
> + AMDGPU_RAS_IH_POISON_CONSUMPTION = 0,
> + AMDGPU_RAS_IH_LAST,
> +};
> +
>  struct ras_common_if {
>   enum amdgpu_ras_block block;
>   enum amdgpu_ras_error_type type;
> @@ -419,6 +424,8 @@ struct ras_ih_data {
>   unsigned int aligned_element_size;
>   unsigned int rptr;
>   unsigned int wptr;
> + /* interrupt type flag */
> + unsigned int flag;
>  };
>
>  struct ras_manager {
> --
> 2.35.1

Re: [PATCH] drm/amdkfd: Asynchronously free events

2022-04-11 Thread Felix Kuehling


Am 2022-04-11 um 08:52 schrieb philip yang:



On 2022-04-07 22:39, Felix Kuehling wrote:

The synchronize_rcu call in destroy_events can take several ms, which
noticeably slows down applications destroying many events. Use kfree_rcu
to free the event structure asynchronously and eliminate the
synchronize_rcu call in the user thread.


Will need similar fix in kfd_smi_ev_release, as I noticed sometime it 
is slower to exit the app receiving SMI event.


Reviewed-by: Philip Yang 

Thanks. In the case of kfd_smi_ev_release you'd need to use call_rcu 
because it's more than a simple kfree when you're freeing the 
client->fifo. That would have some implications for module unloading, 
too. Maybe you can simplify it by using DECLARE_KFIFO in struct 
kfd_smi_client, since we create the fifo with a fixed size anway. That 
way the cleanup is just a kfree_rcu(client).


Regards,
  Felix



Signed-off-by: Felix Kuehling
---
  drivers/gpu/drm/amd/amdkfd/kfd_events.c | 3 +--
  drivers/gpu/drm/amd/amdkfd/kfd_events.h | 2 ++
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 0fef24b0b915..75847c5d5957 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -262,8 +262,7 @@ static void destroy_event(struct kfd_process *p, struct 
kfd_event *ev)
p->signal_event_count--;
  
  	idr_remove(&p->event_idr, ev->event_id);

-   synchronize_rcu();
-   kfree(ev);
+   kfree_rcu(ev, rcu);
  }
  
  static void destroy_events(struct kfd_process *p)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_events.h
index 55d376f56021..1c62c8dd6460 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.h
@@ -70,6 +70,8 @@ struct kfd_event {
struct kfd_hsa_memory_exception_data memory_exception_data;
struct kfd_hsa_hw_exception_data hw_exception_data;
};
+
+   struct rcu_head rcu; /* for asynchronous kfree_rcu */
  };
  
  #define KFD_EVENT_TIMEOUT_IMMEDIATE 0

[PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10

2022-04-11 Thread Yongqiang Sun

MS_HYPERV with vega10 doesn't have the interface to process
request init data msg.
Check hypervisor type to not send the request for MS_HYPERV.

Signed-off-by: Yongqiang Sun 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 933c41f77c92..56b130ec44a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -23,6 +23,10 @@
 
 #include 
 
+#ifdef CONFIG_X86
+#include 
+#endif
+
 #include 
 
 #include "amdgpu.h"
@@ -721,8 +725,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device 
*adev)
break;
case CHIP_VEGA10:
soc15_set_virt_ops(adev);
-   /* send a dummy GPU_INIT_DATA request to host on vega10 
*/
-   amdgpu_virt_request_init_data(adev);
+#ifdef CONFIG_X86
+   /* not send GPU_INIT_DATA with MS_HYPERV*/
+   if (hypervisor_is_type(X86_HYPER_MS_HYPERV) == false)
+#endif
+   /* send a dummy GPU_INIT_DATA request to host 
on vega10 */
+   amdgpu_virt_request_init_data(adev);
break;
case CHIP_VEGA20:
case CHIP_ARCTURUS:
-- 
2.25.1

Re: [PATCH v2] drm/amd/pm: Disable fan control if not supported

2022-04-11 Thread Lazar, Lijo





On 4/11/2022 7:02 PM, Elena Sakhnovitch wrote:

On Sienna Cichild, not all platforms use PMFW based fan control. On such
ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon
knobs for fan control also as it is not possible to report or control
fan speed on such platforms through driver.
v2: FEATURE_FAN_CONTROL_MASK is replaced with FEATURE_FAN_CONTROL_BIT

Signed-off-by: Elena Sakhnovitch  
---
  .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index ab3e9d8b831e..ddc388b061b6 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -371,6 +371,18 @@ static void sienna_cichlid_check_bxco_support(struct 
smu_context *smu)
}
  }
  
+static void sienna_cichlid_check_fan_support(struct smu_context *smu)

+{
+   struct smu_table_context *table_context = &smu->smu_table;
+   PPTable_t *pptable = table_context->driver_pptable;
+   /* No sort of fan control possible if PPTable has it disabled */
+   smu->adev->pm.no_fan =
+   !(pptable->FeaturesToRun[1] & (1U << FEATURE_FAN_CONTROL_BIT));


#define FEATURE_FAN_CONTROL_BIT 32

Given the above, this doesn't work. You may cast FeaturesToRun as 
uin64_t* and use a mask like 1ULL << FEATURE_FAN_CONTROL_BIT


Thanks,
Lijo


+   if (smu->adev->pm.no_fan)
+   dev_info_once(smu->adev->dev,
+ "PMFW based fan control disabled");
+}
+
  static int sienna_cichlid_check_powerplay_table(struct smu_context *smu)
  {
struct smu_table_context *table_context = &smu->smu_table;
@@ -381,6 +393,7 @@ static int sienna_cichlid_check_powerplay_table(struct 
smu_context *smu)
smu->dc_controlled_by_gpio = true;
  
  	sienna_cichlid_check_bxco_support(smu);

+   sienna_cichlid_check_fan_support(smu);
  
  	table_context->thermal_controller_type =

powerplay_table->thermal_controller_type;
@@ -410,7 +423,7 @@ static int sienna_cichlid_append_powerplay_table(struct 
smu_context *smu)
GET_PPTABLE_MEMBER(I2cControllers, &table_member);
memcpy(table_member, smc_dpm_table->I2cControllers,
sizeof(*smc_dpm_table) - 
sizeof(smc_dpm_table->table_header));
-   
+
return 0;
  }

[PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10

2022-04-11 Thread Yongqiang Sun

MS_HYPERV with vega10 doesn't have the interface to process
request init data msg.
Check hypervisor type to not send the request for MS_HYPERV.

Signed-off-by: Yongqiang Sun 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 933c41f77c92..56b130ec44a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -23,6 +23,10 @@
 
 #include 
 
+#ifdef CONFIG_X86
+#include 
+#endif
+
 #include 
 
 #include "amdgpu.h"
@@ -721,8 +725,12 @@ void amdgpu_detect_virtualization(struct amdgpu_device 
*adev)
break;
case CHIP_VEGA10:
soc15_set_virt_ops(adev);
-   /* send a dummy GPU_INIT_DATA request to host on vega10 
*/
-   amdgpu_virt_request_init_data(adev);
+#ifdef CONFIG_X86
+   /* not send GPU_INIT_DATA with MS_HYPERV*/
+   if (hypervisor_is_type(X86_HYPER_MS_HYPERV) == false)
+#endif
+   /* send a dummy GPU_INIT_DATA request to host 
on vega10 */
+   amdgpu_virt_request_init_data(adev);
break;
case CHIP_VEGA20:
case CHIP_ARCTURUS:
-- 
2.25.1

Re: [PATCH v3] drm: add a check to verify the size alignment

2022-04-11 Thread Christian König





Am 11.04.22 um 15:49 schrieb Arunpravin Paneer Selvam:


On 11/04/22 7:02 pm, Matthew Auld wrote:

On 11/04/2022 13:42, Christian König wrote:

Am 11.04.22 um 11:47 schrieb Matthew Auld:

On 11/04/2022 08:38, Arunpravin Paneer Selvam wrote:

Add a simple check to reject any size not aligned to the
min_page_size.

when size is not aligned to min_page_size, driver module
should handle in their own way either to round_up() the
size value to min_page_size or just to enable WARN_ON().

If we dont handle the alignment properly, we may hit the
following bug, Unigine Heaven has allocation requests for
example required pages are 257 and alignment request is 256.
To allocate the left over 1 page, continues the iteration to
find the order value which is 0 and when it compares with
min_order = 8, triggers the BUG_ON(order < min_order).

v2: add more commit description
v3: remove WARN_ON()

Signed-off-by: Arunpravin Paneer Selvam

Suggested-by: Matthew Auld 

Reviewed-by: Matthew Auld 


Question here is who will be pushing that to drm-misc-next? Should I
take care of that?

Yes, please do.


I think it's time that Arun should request push permission for
drm-misc-next.

How to get push permission for drm-misc-next, should I send request mail
to maintainers, may be next time I will push myself.


See here 
https://drm.pages.freedesktop.org/maintainer-tools/commit-access.html


Regards,
Christian.



Thanks,
Arun

Thanks,
Christian.

Re: [PATCH v3] drm: add a check to verify the size alignment

2022-04-11 Thread Arunpravin Paneer Selvam

On 11/04/22 7:02 pm, Matthew Auld wrote:
> On 11/04/2022 13:42, Christian König wrote:
>>
>> Am 11.04.22 um 11:47 schrieb Matthew Auld:
>>> On 11/04/2022 08:38, Arunpravin Paneer Selvam wrote:
 Add a simple check to reject any size not aligned to the
 min_page_size.

 when size is not aligned to min_page_size, driver module
 should handle in their own way either to round_up() the
 size value to min_page_size or just to enable WARN_ON().

 If we dont handle the alignment properly, we may hit the
 following bug, Unigine Heaven has allocation requests for
 example required pages are 257 and alignment request is 256.
 To allocate the left over 1 page, continues the iteration to
 find the order value which is 0 and when it compares with
 min_order = 8, triggers the BUG_ON(order < min_order).

 v2: add more commit description
 v3: remove WARN_ON()

 Signed-off-by: Arunpravin Paneer Selvam 

 Suggested-by: Matthew Auld 
>>> Reviewed-by: Matthew Auld 
>>>
>>
>> Question here is who will be pushing that to drm-misc-next? Should I 
>> take care of that?
> 
> Yes, please do.
> 
>>
>> I think it's time that Arun should request push permission for 
>> drm-misc-next.

How to get push permission for drm-misc-next, should I send request mail
to maintainers, may be next time I will push myself.

Thanks,
Arun
>>
>> Thanks,
>> Christian.

[PATCH v2] drm/amd/pm: Disable fan control if not supported

2022-04-11 Thread Elena Sakhnovitch

On Sienna Cichild, not all platforms use PMFW based fan control. On such
ASICs fan control by PMFW will be disabled in PPTable. Disable hwmon
knobs for fan control also as it is not possible to report or control
fan speed on such platforms through driver.
v2: FEATURE_FAN_CONTROL_MASK is replaced with FEATURE_FAN_CONTROL_BIT

Signed-off-by: Elena Sakhnovitch  
---
 .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index ab3e9d8b831e..ddc388b061b6 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -371,6 +371,18 @@ static void sienna_cichlid_check_bxco_support(struct 
smu_context *smu)
}
 }
 
+static void sienna_cichlid_check_fan_support(struct smu_context *smu)
+{
+   struct smu_table_context *table_context = &smu->smu_table;
+   PPTable_t *pptable = table_context->driver_pptable;
+   /* No sort of fan control possible if PPTable has it disabled */
+   smu->adev->pm.no_fan =
+   !(pptable->FeaturesToRun[1] & (1U << FEATURE_FAN_CONTROL_BIT));
+   if (smu->adev->pm.no_fan)
+   dev_info_once(smu->adev->dev,
+ "PMFW based fan control disabled");
+}
+
 static int sienna_cichlid_check_powerplay_table(struct smu_context *smu)
 {
struct smu_table_context *table_context = &smu->smu_table;
@@ -381,6 +393,7 @@ static int sienna_cichlid_check_powerplay_table(struct 
smu_context *smu)
smu->dc_controlled_by_gpio = true;
 
sienna_cichlid_check_bxco_support(smu);
+   sienna_cichlid_check_fan_support(smu);
 
table_context->thermal_controller_type =
powerplay_table->thermal_controller_type;
@@ -410,7 +423,7 @@ static int sienna_cichlid_append_powerplay_table(struct 
smu_context *smu)
GET_PPTABLE_MEMBER(I2cControllers, &table_member);
memcpy(table_member, smc_dpm_table->I2cControllers,
sizeof(*smc_dpm_table) - 
sizeof(smc_dpm_table->table_header));
-   
+
return 0;
 }
 
-- 
2.25.1

1 2 >

1 - 100 of 142 matches

Mail list logo