Re: [PATCH] Revert "drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+"

2023-07-12 Thread Guilherme G. Piccoli
On 12/07/2023 11:47, Pillai, Aurabindo wrote:
> Hi Guilherme,
> 
> Sorry there was one more patch which I missed to attach. Please add this
> 3^rd  patch and retry.
> 
> Reverting that patch would cause high power consumption on Navi2x GPU
> also cause hangs on certain multi monitor configurations. With these 3
> patches, you're getting the same effect as reverting the aforementioned
> patches, but it makes the reverted sequence available only for Steam
> deck hardware.
> 

Thanks a lot for your detailed explanation, and the 3rd patch! Indeed,
amdgpu works fine on Deck with that - great =)

Feel free to add my:
Tested-by: Guilherme G. Piccoli  #Steam Deck

Oh, a fixes tag would also makes sense, I guess.
BTW, if possible to submit the 3 patches in a proper series to get it
merged on 6.5-rc cycle (the sooner the better), I'd really appreciate!

Cheers,


Guilherme






Re: [PATCH] Revert "drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+"

2023-07-12 Thread Guilherme G. Piccoli
On 11/07/2023 15:22, Aurabindo Pillai wrote:
> [...]
> Hi,
> 
> Sorry for the delayed response, this patch went unnoticed. This revert would 
> break asics. Could you try the attached patch without reverting this one ?

Hi Aurabindo, thanks for your response!

I've tried kernel 6.5-rc1, and it seems the issue is present, due to the
patch being merged on Linus tree [as 1598fc576420 ("drm/amd/display:
Program OTG vtotal min/max selectors unconditionally for DCN1+")].

Then, I tried both your attached patches on top of that, and
unfortunately, the behavior is the same: Steam Deck doesn't boot with
graphics, and we can see the single error "amdgpu :04:00.0: [drm]
*ERROR* [CRTC:67:crtc-0] flip_done timed out" on dmesg.

Do you / Alex think we could get this revert for 6.5-rc2, so at least we
could boot mainline there while the issue is handled? It would be an
intermediate fix. You mentioned it breaks some asics, but did they work
until now, without your patch?

Thanks,


Guilherme


[PATCH] Revert "drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+"

2023-07-03 Thread Guilherme G. Piccoli
This reverts commit 06c3a652a787efc960af7c8816036d25c4227c6c.

After this commit, the Steam Deck cannot boot with graphics anymore;
the following message is observed on dmesg:

"[drm] ERROR [CRTC:67:crtc-0] flip_done timed out"

No other error is observed, it just stays like that. After bisecting
amd-staging-drm-next, we narrowed it down to this commit. Seems it
makes sense to revert it to have the tree bootable until a proper
solution is worked.

Cc: Aurabindo Pillai 
Cc: André Almeida 
Cc: Melissa Wen 
Cc: Rodrigo Siqueira 
Signed-off-by: Guilherme G. Piccoli 

---

Hi Alex / Aurabindo, we couldn't boot the Deck with in HEAD
(amd-staging-drm-next), git bisect led to this commit. Since its
description already mentions a potential proper solution, related
to the DMCUB (and some complex state tracking), I thought it was
more effective to revert it to allow booting the tree in Deck (and
maybe other HW - I just tested the Deck BTW).
Lemme know your thoughts.

Special thanks to André and Melissa for helping the debug / bisect!
We're open to test alternative patches, feel free to ping.
Cheers,

Guilherme


 drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c | 15 ---
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_optc.c | 10 --
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c 
b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
index 0e8f4f36c87c..27419cd98264 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
@@ -945,10 +945,19 @@ void optc1_set_drr(
OTG_FORCE_LOCK_ON_EVENT, 0,
OTG_SET_V_TOTAL_MIN_MASK_EN, 0,
OTG_SET_V_TOTAL_MIN_MASK, 0);
-   }
 
-   // Setup manual flow control for EOF via TRIG_A
-   optc->funcs->setup_manual_trigger(optc);
+   // Setup manual flow control for EOF via TRIG_A
+   optc->funcs->setup_manual_trigger(optc);
+
+   } else {
+   REG_UPDATE_4(OTG_V_TOTAL_CONTROL,
+   OTG_SET_V_TOTAL_MIN_MASK, 0,
+   OTG_V_TOTAL_MIN_SEL, 0,
+   OTG_V_TOTAL_MAX_SEL, 0,
+   OTG_FORCE_LOCK_ON_EVENT, 0);
+
+   optc->funcs->set_vtotal_min_max(optc, 0, 0);
+   }
 }
 
 void optc1_set_vtotal_min_max(struct timing_generator *optc, int vtotal_min, 
int vtotal_max)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_optc.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_optc.c
index 58bdbd859bf9..d6f095b4555d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_optc.c
@@ -462,16 +462,6 @@ void optc2_setup_manual_trigger(struct timing_generator 
*optc)
 {
struct optc *optc1 = DCN10TG_FROM_TG(optc);
 
-   /* Set the min/max selectors unconditionally so that
-* DMCUB fw may change OTG timings when necessary
-* TODO: Remove the w/a after fixing the issue in DMCUB firmware
-*/
-   REG_UPDATE_4(OTG_V_TOTAL_CONTROL,
-OTG_V_TOTAL_MIN_SEL, 1,
-OTG_V_TOTAL_MAX_SEL, 1,
-OTG_FORCE_LOCK_ON_EVENT, 0,
-OTG_SET_V_TOTAL_MIN_MASK, (1 << 1)); /* TRIGA 
*/
-
REG_SET_8(OTG_TRIGA_CNTL, 0,
OTG_TRIGA_SOURCE_SELECT, 21,
OTG_TRIGA_SOURCE_PIPE_SELECT, optc->inst,
-- 
2.40.1



Re: [PATCH 2/2] drm/amdgpu/gfx11: Adjust gfxoff before powergating on gfx11 as well

2023-05-12 Thread Guilherme G. Piccoli
On 12/05/2023 10:29, Alex Deucher wrote:
> [...]
>> BTW, I couldn't test this one since I don't have GFX11 HW, so appreciate
>> if anyone from AMD (or any community member) could give it a try...
> 
> We've tested this on gfx11 and applied the patches.  Thanks!
> 
> Alex
> 

That's great, thanks a lot Alex!


Re: [PATCH 2/2] drm/amdgpu/gfx11: Adjust gfxoff before powergating on gfx11 as well

2023-05-09 Thread Guilherme G. Piccoli
On 09/05/2023 13:49, Bas Nieuwenhuizen wrote:
> From: "Guilherme G. Piccoli" 
> 
> (Bas: speculative change to mirror gfx10/gfx9)
> 
> Signed-off-by: Guilherme G. Piccoli 
> Cc: Alex Deucher 
> ---

Thanks a lot for both patches Bas! This second one, despite I've
attached it on gitlab, you should also Sign-off since you're the one
sending it to the ML (I guess heh)

BTW, I couldn't test this one since I don't have GFX11 HW, so appreciate
if anyone from AMD (or any community member) could give it a try...

Cheers,


Guilherme


Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes

2023-04-20 Thread Guilherme G. Piccoli
On 20/04/2023 12:56, gre...@linuxfoundation.org wrote:
> [...]
> That's 3000+ emails ago for me :)

/head_exploding

this is > 1000 emails per day, wow...my sympathies to you heh

> [...]
>> tl;dr: the offender is present on 6.1.y, but this fix is not, hence I'm
>> hereby requesting the merge. Some backport/context adjustment was
>> necessary and it was properly tested in the Steam Deck.
> 
> Ok, we'll queue it up soon, thanks.
> 
> greg k-h

Thanks =)


Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes

2023-04-20 Thread Guilherme G. Piccoli
On 20/04/2023 12:02, gre...@linuxfoundation.org wrote:
>> [...]
>>> Which "one" are you referring to here?
>>>
>>> confused,
>>>
>>> greg k-h
>>
>> This one, sent in this email thread.
> 
> I don't have "this email thread" anymore, remember, some of us get
> thousand+ emails a day...

I don't really understand the issue to be honest, we are talking in the
very email thread! The email was sent April/18, it's not old or anything.

But in any case, for reference, this is the original email from the lore
archives:
https://lore.kernel.org/stable/20230418221522.1287942-1-gpicc...@igalia.com/

> 
>> The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on
>> Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream
>> hash(es) is 542a56e8eb44 heh
> 
> But that commit says it fixes a problem in the 6.2 tree, why is this
> relevant for 6.1.y?
> 

That is explained in the email and the very reason for that, is the
duplicate hashes we are discussing here.

The fix commit in question points the "Fixes:" tag to 82132ecc5432
("drm/amdgpu: enable Vangogh VCN indirect sram mode"), which appears to
be in 6.2 tree, right?

But notice that 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect
sram mode") is the *same* offender and..is present on 6.1 !

In other words, when I first wrote this fix, I just checked the tree
quickly and came up with "Fixes: 82132ecc5432", but to be thorough, I
should have pointed the fixes tag to 9a8cc8cabc1e, to pick it on 6.1.y.


tl;dr: the offender is present on 6.1.y, but this fix is not, hence I'm
hereby requesting the merge. Some backport/context adjustment was
necessary and it was properly tested in the Steam Deck.

Thanks,


Guilherme


Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes

2023-04-20 Thread Guilherme G. Piccoli
On 20/04/2023 09:42, gre...@linuxfoundation.org wrote:
> [...]
>> @Greg, can you pick this one? Thanks!
> 
> Which "one" are you referring to here?
> 
> confused,
> 
> greg k-h

This one, sent in this email thread.

The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on
Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream
hash(es) is 542a56e8eb44 heh


Cheers,

Guilherme


Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes

2023-04-20 Thread Guilherme G. Piccoli
On 19/04/2023 17:04, Deucher, Alexander wrote:
> [...]
>> So, let me check if I understood properly: there are 2 trees (-fixes and 
>> -next),
>> and the problem is that their merge onto mainline happens apart and there
>> are kind of duplicate commits, that were first merged on -fixes, then "re-
>> merged" on -next, right?
>>
> 
> Each subsystem works on new features, then when the merge window opens, Linus 
> pulls them into mainline.  At that point, mainline goes into RCs and then 
> mainline is bug fixes only.  Meanwhile in parallel, each subsystem is working 
> on new feature development for the next merge window (subsystem -next trees). 
>  The trees tend to lag behind mainline a bit.  Most developers focus on them 
> in support of upcoming features.  They are usually also where bugs are fixed. 
>  If a bug is fixed in the -next tree, what's the best way to get it into 
> mainline?  I guess ideally it would be fixed in mainline and them mainline 
> would be merged into everyone's -next branch, but that's not always 
> practical.  Often times new features depend on bug fixes and then you'd end 
> up stalling new development waiting for a back merge, or you'd have to rebase 
> your -next branches regularly so that they would shed any patches in mainline 
> which is not great either.  We try and cherry-pick -x when we can to show the 
> relationship.
> 
>> Would be possible to clean the -next tree right before the merge, using
>> some script to find commits there that are going to be merged but are
>> already present? Then you'd have a -next-to-merge tree that is clean and
>> doesn't present dups, avoiding this.
> 
> There's no real way to clean it without rewriting history.  You'd basically 
> need to do regular backmerges and rebases of the -next trees.  Also then what 
> do you do if you already have a feature in -next (say Dave or Daniel have 
> already pulled in my latest amdgpu PR for -next) and you realize that one of 
> those patches is an important bug fix for mainline?  If the drm -next tree 
> rebased then all of the other drm driver trees that feed into it would need 
> to rebase as well otherwise they'd have stale history.
> 
> You also have the case of a fix in -next needing to land in mainline, but due 
> to differences in the trees, it needs backporting to apply properly.
> 
>>
>> Disclaimer: I'm not a maintainer, maybe there are smart ways of doing that or
>> perhaps my suggestion is silly and unfeasible heh But seems likely that other
>> maintainers face this problem as well and came up with some solution -
>> avoiding the dups would be a good thing, IMO, if possible.
> 
> 
> No worries.  I agree.  I haven't seen anything so far that really addresses 
> handles this effectively.
> 
> Alex

Thanks a bunch Alex, I appreciate your time detailing the issue, which
seems...quite complex and annoying, indeed ={

What I'll do from now on is trying to check all hashes that match a
commit, so I can add more than one fixes tag if that makes sense. At
least, this way I can prevent missing fixes in stable like this patch heh

@Greg, can you pick this one? Thanks!

Cheers,


Guilherme


Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes

2023-04-19 Thread Guilherme G. Piccoli
On 19/04/2023 10:16, Deucher, Alexander wrote:
> [...]
>> This is quite strange for me, we have 2 commit hashes pointing to the
>> *same* commit, and each one is present..in a different release !!?!
>> Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable
>> misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).
>>
>> Alex, do you have an idea why sometimes commits from the AMD tree
>> appear duplicate in mainline? Specially in different releases, this could 
>> cause
>> some confusion I guess.
> 
> This is pretty common in the drm.  The problem is there is not a good way to 
> get bug fixes into both -next and -fixes without constant back merges.  So 
> the patches end up in -next and if they are important for stable, they go to 
> -fixes too.  We don't want -next to be broken, but we can't wait until the 
> next kernel cycle to get the bug fixes into the current kernel and stable.  I 
> don't know of a good way to make this smoother.
> 
> Alex

Thanks Alex, seems quite complicated indeed.

So, let me check if I understood properly: there are 2 trees (-fixes and
-next), and the problem is that their merge onto mainline happens apart
and there are kind of duplicate commits, that were first merged on
-fixes, then "re-merged" on -next, right?

Would be possible to clean the -next tree right before the merge, using
some script to find commits there that are going to be merged but are
already present? Then you'd have a -next-to-merge tree that is clean and
doesn't present dups, avoiding this.

Disclaimer: I'm not a maintainer, maybe there are smart ways of doing
that or perhaps my suggestion is silly and unfeasible heh
But seems likely that other maintainers face this problem as well and
came up with some solution - avoiding the dups would be a good thing,
IMO, if possible.

Cheers,


Guilherme




[PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes

2023-04-19 Thread Guilherme G. Piccoli
commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream.

The VCN firmware loading path enables the indirect SRAM mode if it's
advertised as supported. We might have some cases of FW issues that
prevents this mode to working properly though, ending-up in a failed
probe. An example below, observed in the Steam Deck:

[...]
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x)
amdgpu :04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring 
vcn_dec_0 test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block  
failed -110
amdgpu :04:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu :04:00.0: amdgpu: Fatal error during GPU init
[...]

Disabling the VCN block circumvents this, but it's a very invasive
workaround that turns off the entire feature. So, let's add a quirk
on VCN loading that checks for known problematic BIOSes on Vangogh,
so we can proactively disable the indirect SRAM mode and allow the
HW proper probe and VCN IP block to work fine.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Cc: sta...@vger.kernel.org
Cc: James Zhu 
Cc: Leo Liu 
Signed-off-by: Guilherme G. Piccoli 
Signed-off-by: Alex Deucher 
---

Hi folks, this was build/boot tested on Deck. I've also adjusted the
context, function was reworked on 6.2.

But what a surprise was for me not see this fix already in 6.1.y, since
I've CCed stable, and the reason for that is really peculiar:


$ git log -1 --pretty="%an <%ae>: %s" 82132ecc5432
Leo Liu : drm/amdgpu: enable Vangogh VCN indirect sram mode

$ git describe --contains 82132ecc5432
v6.2-rc1~124^2~1^2~13


$ git log -1 --pretty="%an <%ae>: %s" 9a8cc8cabc1e
Leo Liu : drm/amdgpu: enable Vangogh VCN indirect sram mode

$ git describe --contains 9a8cc8cabc1e
v6.1-rc8~16^2^2


This is quite strange for me, we have 2 commit hashes pointing to the *same*
commit, and each one is present..in a different release !!?!
Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable
misses it, since it only contains 9a8cc8cabc1e (which is the same patch!).

Alex, do you have an idea why sometimes commits from the AMD tree appear
duplicate in mainline? Specially in different releases, this could cause
some confusion I guess.

Thanks in advance,


Guilherme


 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index ce64ca1c6e66..5c1193dd7d88 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -84,6 +85,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
 {
unsigned long bo_size;
const char *fw_name;
+   const char *bios_ver;
const struct common_firmware_header *hdr;
unsigned char fw_check;
unsigned int fw_shared_size, log_offset;
@@ -159,6 +161,21 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
(adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
adev->vcn.indirect_sram = true;
+   /*
+* Some Steam Deck's BIOS versions are incompatible with the
+* indirect SRAM mode, leading to amdgpu being unable to get
+* properly probed (and even potentially crashing the kernel).
+* Hence, check for these versions here - notice this is
+* restricted to Vangogh (Deck's APU).
+*/
+   bios_ver = dmi_get_system_info(DMI_BIOS_VERSION);
+
+   if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) ||
+!strncmp("F7A0114", bios_ver, 7))) {
+   adev->vcn.indirect_sram = false;
+   dev_info(adev->dev,
+   "Steam Deck quirk: indirect SRAM disabled on 
BIOS %s\n", bios_ver);
+   }
break;
case IP_VERSION(3, 0, 16):
fw_name = FIRMWARE_DIMGREY_CAVEFISH;
-- 
2.40.0



[PATCH] drm/amd/pm: Fix incorrect comment about Vangogh power cap support

2023-03-30 Thread Guilherme G. Piccoli
The comment mentions that power1 cap attributes are not supported on
Vangogh, but the opposite is indeed valid: for APUs, only Vangogh is
supported. While at it, also fixed the Renoir comment below (thanks
Melissa for noticing that!).

Cc: Lijo Lazar 
Cc: Melissa Wen 
Signed-off-by: Guilherme G. Piccoli 
---


Hi folks, this is a very trivial one, just to fix the comments - since I needed
to re-read the code/comments to figure out, I'm trying to improve it here to
avoid others to get confused as well heh

Lemme know in case I made some mistake (even after re-re-reading the code).
Cheers,


Guilherme


 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index d75a67cfe523..6460906c354c 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -3351,7 +3351,7 @@ static umode_t hwmon_attributes_visible(struct kobject 
*kobj,
  attr == _dev_attr_pwm1_enable.dev_attr.attr)) /* can't 
manage state */
effective_mode &= ~S_IWUSR;
 
-   /* not implemented yet for GC 10.3.1 APUs */
+   /* In the case of APUs, this is only implemented on Vangogh */
if (((adev->family == AMDGPU_FAMILY_SI) ||
 ((adev->flags & AMD_IS_APU) && (gc_ver != IP_VERSION(10, 3, 1 
&&
(attr == _dev_attr_power1_cap_max.dev_attr.attr ||
@@ -3360,7 +3360,7 @@ static umode_t hwmon_attributes_visible(struct kobject 
*kobj,
 attr == _dev_attr_power1_cap_default.dev_attr.attr))
return 0;
 
-   /* not implemented yet for APUs having <= GC 9.3.0 */
+   /* not implemented yet for APUs having < GC 9.3.0 (Renoir) */
if (((adev->family == AMDGPU_FAMILY_SI) ||
 ((adev->flags & AMD_IS_APU) && (gc_ver < IP_VERSION(9, 3, 0 &&
(attr == _dev_attr_power1_average.dev_attr.attr))
-- 
2.40.0



Re: [Resend PATCH v1 0/3] send message to pmfw when SMT changes

2023-03-28 Thread Guilherme G. Piccoli
On 28/03/2023 03:07, Yang, WenYou wrote:
> [AMD Official Use Only - General]
> [...]
>> Hi Wenyou, thank you for the clarification and for the interesting patch set!
>>
>> So, just so I can understand: is it expected that gamers disable SMT? I heard
>> some games got their performance improved, but not sure the reason...if you
>> have thoughts on that, I'm pretty interested!
> Hi Guilherme,
> 
> No, it not. It is not to disable SMT.
> 
> Yes, there is a commit to get performance improved.
> https://github.com/torvalds/linux/commit/a8fb40966f19ff81520d9ccf8f7e2b95201368b8
> 
> Best Regards,
> Wenyou

Thanks, this one is present in Deck's kernel for a while.
Cheers,


Guilherme


Re: [Resend PATCH v1 0/3] send message to pmfw when SMT changes

2023-03-27 Thread Guilherme G. Piccoli
On 26/03/2023 23:49, Yang, WenYou wrote:
> Hi Guilherme,
> 
> Thank you for your attention on the patch set.
> The purpose of the patch set is to improve the performance when playing the 
> game.
> 
> Best Regards,
> Wenyou

Hi Wenyou, thank you for the clarification and for the interesting patch
set!

So, just so I can understand: is it expected that gamers disable SMT? I
heard some games got their performance improved, but not sure the
reason...if you have thoughts on that, I'm pretty interested!

Cheers,


Guilherme


Re: [Resend PATCH v1 0/3] send message to pmfw when SMT changes

2023-03-23 Thread Guilherme G. Piccoli
Hi Wenyou Yang, first of all thanks for the improvement!

I'd like to ask you (and all CCed) if it would be possible to explain a
bit the goal / functionality behind these patches.

By reading the commit descriptions and code, I can understand code-wise
what's going on and how this will message the FW on SMT changes. What I
couldn't parse is the purpose of this, or in other words, what does it
gain for us? Also, why only on Vangogh?

Since I don't have the spec I couldn't read and learn myself - apologies
if this is somewhat a silly question. Also, if for some reason you
cannot respond (like a HW "NDA"), it's fine too.

Thanks in advance,


Guilherme


[PATCH] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes

2023-03-12 Thread Guilherme G. Piccoli
The VCN firmware loading path enables the indirect SRAM mode if it's
advertised as supported. We might have some cases of FW issues that
prevents this mode to working properly though, ending-up in a failed
probe. An example below, observed in the Steam Deck:

[...]
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x)
amdgpu :04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring 
vcn_dec_0 test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block  
failed -110
amdgpu :04:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu :04:00.0: amdgpu: Fatal error during GPU init
[...]

Disabling the VCN block circumvents this, but it's a very invasive
workaround that turns off the entire feature. So, let's add a quirk
on VCN loading that checks for known problematic BIOSes on Vangogh,
so we can proactively disable the indirect SRAM mode and allow the
HW proper probe and VCN IP block to work fine.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Cc: sta...@vger.kernel.org
Cc: James Zhu 
Cc: Leo Liu 
Signed-off-by: Guilherme G. Piccoli 
---


Hi folks, based on the feedback from the gitlab issue, here is the upstream
attempt to quirk the Steam Deck's BIOSes having known issues with the
indirect SRAM mode. I've tested it on both the quirked BIOSes, and also
with some working ones. This patch is based on agd5f/amd-staging-drm-next.

Thanks in advance for reviews!
Cheers,

Guilherme


 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 02d428ddf2f8..dc4f3f4cb644 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -114,6 +115,24 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
(adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
adev->vcn.indirect_sram = true;
 
+   /*
+* Some Steam Deck's BIOS versions are incompatible with the
+* indirect SRAM mode, leading to amdgpu being unable to get
+* properly probed (and even potentially crashing the kernel).
+* Hence, check for these versions here - notice this is
+* restricted to Vangogh (Deck's APU).
+*/
+   if (adev->ip_versions[UVD_HWIP][0] == IP_VERSION(3, 0, 2)) {
+   const char *bios_ver = dmi_get_system_info(DMI_BIOS_VERSION);
+
+   if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) ||
+!strncmp("F7A0114", bios_ver, 7))) {
+   adev->vcn.indirect_sram = false;
+   dev_info(adev->dev,
+   "Steam Deck quirk: indirect SRAM disabled on 
BIOS %s\n", bios_ver);
+   }
+   }
+
hdr = (const struct common_firmware_header *)adev->vcn.fw->data;
adev->vcn.fw_version = le32_to_cpu(hdr->ucode_version);
 
-- 
2.39.2



Re: [PATCH v3] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-02-02 Thread Guilherme G. Piccoli
On 02/02/2023 13:12, Luben Tuikov wrote:
> Hi Guilherme,
> 
> Thanks for redoing to a v3. This patch is:
> 
> Reviewed-by: Luben Tuikov 
> 
> Regards,
> Luben
> 

Thank you for the reviews Luben, much appreciated!


Re: [PATCH v2] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-02-02 Thread Guilherme G. Piccoli
On 02/02/2023 08:58, Christian König wrote:
> [...]
>> +if (!ring->no_scheduler && ring->sched.ops)
>>  drm_sched_fini(>sched);
> 
> I think we should drop the check for no_scheduler here and just call 
> drm_sched_fini() when the scheduler instance was initialized before.
> 
> Background is that I've found a couple of places where we potentially 
> set no_scheduler to false after the scheduler was already initialized.
> 
> This is then probably leading to a memory leak or worth.
> 
> Regards,
> Christian.

Thanks Christian, nice finding! And thanks for the reviews Guchun / Luben =)

I just submitted a V3 [0], but didn't add the review tags from Guchun /
Luben, since the patch changed.

Cheers,


Guilherme


[0]
https://lore.kernel.org/dri-devel/20230202134856.1382169-1-gpicc...@igalia.com/


[PATCH v3] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-02-02 Thread Guilherme G. Piccoli
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of :04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
 
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].

[0] 
https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb...@amd.com/
[1] 
https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7...@amd.com/

Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)")
Suggested-by: Christian König 
Cc: Guchun Chen 
Cc: Luben Tuikov 
Cc: Mario Limonciello 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 00444203220d..faff4a3f96e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device 
*adev)
if (!ring || !ring->fence_drv.initialized)
continue;
 
-   if (!ring->no_scheduler)
+   /*
+* Notice we check for sched.ops since there's some
+* override on the meaning of sched.ready by amdgpu.
+* The natural check would be sched.ready, which is
+* set as drm_sched_init() finishes...
+*/
+   if (ring->sched.ops)
drm_sched_fini(>sched);
 
for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
-- 
2.39.0



Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-02-01 Thread Guilherme G. Piccoli
On 01/02/2023 13:21, Luben Tuikov wrote:
> Hi Guilherme,
> 
> Since setting sched->ready to false, seems to be taking place in, directly 
> amdgpu_ring_fini()
> and in amdgpu_fence_driver_sw_fini() indirectly as that function calls 
> drm_sched_fini()
> which sets it to false, we seem to have two competing policies of,
> "set ready to false to show that _fini() was called, and set to false to 
> disable IB submissions".
> 
> To that effect, your patch is generally correct, as it would be the case of 
> an early failure
> and unroll from (indirectly) amdgpu_device_init_schedulers().
> 
> Please resubmit your patch but using .ops as Christian suggested, as .name is 
> sufficient,
> but .ops is necessary.
> 
> On a side-note: in the future we should probably discern between
> "this ring has an initialized and working scheduler" (looking up at DRM), from
> "this ring can take on IBs to send them down to the hardware" (looking down 
> at hardware).
> Sched->ready seems to be overloaded with these disparate states, and this is 
> why you need
> to use .ops to guard calling drm_sched_fini().
> 
> Regards,
> Luben

Thanks a lot Luben, makes perfect sense!

Also, thanks for everyone that provided feedback here, very interesting
discussion.

Submitted V2:
https://lore.kernel.org/dri-devel/20230201164814.1353383-1-gpicc...@igalia.com/
Cheers,


Guilherme


[PATCH v2] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-02-01 Thread Guilherme G. Piccoli
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of :04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
 
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0].

[0] 
https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb...@amd.com/

Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)")
Suggested-by: Christian König 
Cc: Guchun Chen 
Cc: Luben Tuikov 
Cc: Mario Limonciello 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 00444203220d..3b962cb680a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device 
*adev)
if (!ring || !ring->fence_drv.initialized)
continue;
 
-   if (!ring->no_scheduler)
+   /*
+* Notice we check for sched.ops since there's some
+* override on the meaning of sched.ready by amdgpu.
+* The natural check would be sched.ready, which is
+* set as drm_sched_init() finishes...
+*/
+   if (!ring->no_scheduler && ring->sched.ops)
drm_sched_fini(>sched);
 
for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
-- 
2.39.0



Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Guilherme G. Piccoli
On 31/01/2023 14:52, Alex Deucher wrote:
> [...]
>> (b) We can't use sched.ready, which would make sense...but amdgpu
>> overrides its meaning, the driver manipulates this value for its own
>> purposes of tracking ring init, or something like that.
>>
>> This is the tangential topic: what should we do here? My understanding
>> of Alex's message is that we could have a "ready" field in the ring
>> structure and stop messing with sched.ready - does it make sense Alex?
> 
> Yes, I think so.  The tricky part will be figuring out which current
> sched.ready checks are checking for the scheduler being ready vs.
> whether the ring itself is ready.
> 

Thanks, makes sense!

$ grep -nr "sched.ready" drivers/gpu/drm/amd/ | wc -l
83

Maybe not super tough, I hope heh

>>
>> Guchun / Christian, does it also make sense for you?
>>
>>
>> Regarding (a), I could re-submit having s/sched.name/sched.ops, no
>> biggies, I tested both to be fair, before sending...I just chose name
>> but any field that is proper initialized on drm_sched_init() would work.
> 
> Yeah, I think ops is fine.  You could even use sched.ready after we
> clean up the use of that in the driver.  There are already a bunch of
> places where we check sched.ready to check if the scheduler really is
> ready.

Hmm..unfortunately, doesn't work. This was a case in which sched.ready
was set to true in the ring init routine, but scheduler wasn't properly
initialized. So, a different key for comparison is required..I'll
re-submit with sched.ops.

After a potential rework of the driver to get rid of sched.ready
manipulation, then it could be fixed to properly use this flag...makes
sense to you?

Tnx again for the prompt review!
Cheers,


Guilherme


Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-31 Thread Guilherme G. Piccoli
On 31/01/2023 10:58, Chen, Guchun wrote:
> Hi Christian,
> 
> Do you think if it makes sense that we can set 'ring->sched.ready' to be true 
> in each ring init, even if before executing/setting up drm_sched_init in 
> amdgpu_device_init_schedulers? As 'ready' is a member of gpu scheduler 
> structure.
> 
> Regards,
> Guchun
> 

Hi folks, thanks a lot for the feedback so far, much appreciated!

I'm feeling a bit confused specially since there seems to be 2
orthogonal (yet related) topics being discussed; let me try to summarize
my understanding and we can then further discuss better:

(a) The first problem is the one addressed in the patch - how to prevent
drm_sched_fini() to get called if drm_sched_init() wasn't called?

I've proposed sched.name, seems Christian prefers sched.ops, correct?


(b) We can't use sched.ready, which would make sense...but amdgpu
overrides its meaning, the driver manipulates this value for its own
purposes of tracking ring init, or something like that.

This is the tangential topic: what should we do here? My understanding
of Alex's message is that we could have a "ready" field in the ring
structure and stop messing with sched.ready - does it make sense Alex?

Guchun / Christian, does it also make sense for you?


Regarding (a), I could re-submit having s/sched.name/sched.ops, no
biggies, I tested both to be fair, before sending...I just chose name
but any field that is proper initialized on drm_sched_init() would work.

Thanks,


Guilherme


[PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-30 Thread Guilherme G. Piccoli
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of :04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
 
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using another sched field.

Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)")
Cc: Andrey Grodzovsky 
Cc: Guchun Chen 
Cc: Mario Limonciello 
Signed-off-by: Guilherme G. Piccoli 
---


Hi folks, first of all thanks in advance for reviews / comments!
Notice that I've used the Fixes tag more in the sense to bring it
to stable, I didn't find a good patch candidate that added the
call to drm_sched_fini(), was reaching way too old commits...so
067f44c8b459 seems a good candidate - or maybe not?

Now, with regards sched.ready, spent a bit of time to figure what
was happening...would be feasible maybe to stop using that to
mark some kind ring status? I think it should be possible to add a
flag to the ring structure for that, and free sched.ready from
being manipulate by the amdgpu driver, what's your thoughts on that?

I could try myself, but first of course I'd like to raise the
"temperature" on this topic and check if somebody is already working
on that.

Cheers,

Guilherme


 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 00444203220d..e154eb8241fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device 
*adev)
if (!ring || !ring->fence_drv.initialized)
continue;
 
-   if (!ring->no_scheduler)
+   /*
+* Notice we check for sched.name since there's some
+* override on the meaning of sched.ready by amdgpu.
+* The natural check would be sched.ready, which is
+* set as drm_sched_init() finishes...
+*/
+   if (!ring->no_scheduler && ring->sched.name)
drm_sched_fini(>sched);
 
for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
-- 
2.39.0



Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-30 Thread Guilherme G. Piccoli
+ Luben

(sorry, missed that in the first submission).

On 30/01/2023 18:45, Guilherme G. Piccoli wrote:
> Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
> routine - such function is expected to be called only after the
> respective init function - drm_sched_init() - was executed successfully.
> 
> Happens that we faced a driver probe failure in the Steam Deck
> recently, and the function drm_sched_fini() was called even without
> its counter-part had been previously called, causing the following oops:
> 
> amdgpu: probe of :04:00.0 failed with error -110
> BUG: kernel NULL pointer dereference, address: 0090
> PGD 0 P4D 0
> Oops: 0002 [#1] PREEMPT SMP NOPTI
> CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
> Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
> RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
> [...]
> Call Trace:
>  
>  amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
>  amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
>  amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
>  devm_drm_dev_init_release+0x49/0x70
>  [...]
> 
> To prevent that, check if the drm_sched was properly initialized for a
> given ring before calling its fini counter-part.
> 
> Notice ideally we'd use sched.ready for that; such field is set as the latest
> thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
> field - in the above oops for example, it was a GFX ring causing the crash, 
> and
> the sched.ready field was set to true in the ring init routine, regardless of
> the state of the DRM scheduler. Hence, we ended-up using another sched field.
> 
> Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in 
> s3 test (v2)")
> Cc: Andrey Grodzovsky 
> Cc: Guchun Chen 
> Cc: Mario Limonciello 
> Signed-off-by: Guilherme G. Piccoli 
> ---
> 
> 
> Hi folks, first of all thanks in advance for reviews / comments!
> Notice that I've used the Fixes tag more in the sense to bring it
> to stable, I didn't find a good patch candidate that added the
> call to drm_sched_fini(), was reaching way too old commits...so
> 067f44c8b459 seems a good candidate - or maybe not?
> 
> Now, with regards sched.ready, spent a bit of time to figure what
> was happening...would be feasible maybe to stop using that to
> mark some kind ring status? I think it should be possible to add a
> flag to the ring structure for that, and free sched.ready from
> being manipulate by the amdgpu driver, what's your thoughts on that?
> 
> I could try myself, but first of course I'd like to raise the
> "temperature" on this topic and check if somebody is already working
> on that.
> 
> Cheers,
> 
> Guilherme
> 
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 00444203220d..e154eb8241fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device 
> *adev)
>   if (!ring || !ring->fence_drv.initialized)
>   continue;
>  
> - if (!ring->no_scheduler)
> + /*
> +  * Notice we check for sched.name since there's some
> +  * override on the meaning of sched.ready by amdgpu.
> +  * The natural check would be sched.ready, which is
> +  * set as drm_sched_init() finishes...
> +  */
> + if (!ring->no_scheduler && ring->sched.name)
>   drm_sched_fini(>sched);
>  
>   for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)


[PATCH] drm/amd/display: Trivial swizzle-related code clean-ups

2023-01-30 Thread Guilherme G. Piccoli
This is a very trivial code clean-up related to commit 5468c36d6285
("drm/amd/display: Filter Invalid 420 Modes for HDMI TMDS"). This commit
added a validation on driver probe to prevent invalid TMDS modes, but one
of the fake properties (swizzle) ended-up causing a warning on driver
probe; was reported here: https://gitlab.freedesktop.org/drm/amd/-/issues/2264.

It was fixed by commit 105a8b8698e2 ("drm/amd/display: patch cases with
unknown plane state to prevent warning"), but the validation code had
a double variable assignment, which we hereby remove. Also, the fix relies
in the dcn2{0,1}patch_unknown_plane_state() callbacks, so while at it we
took the opportunity to perform a small code clean-up in such routines.

Cc: Aurabindo Pillai 
Cc: Daniel Wheeler 
Cc: Fangzhi Zuo 
Cc: Harry Wentland 
Cc: Leo Li 
Cc: Mark Broadworth 
Cc: Melissa Wen 
Cc: Rodrigo Siqueira 
Cc: Sung Joon Kim 
Cc: Swapnil Patel 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 -
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 8 ++--
 drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 6 ++
 3 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 86a2f7f58550..e71e94663d14 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -6336,7 +6336,6 @@ static enum dc_status 
dm_validate_stream_and_context(struct dc *dc,
dc_plane_state->plane_size.surface_size.width  = stream->src.width;
dc_plane_state->plane_size.chroma_size.height  = stream->src.height;
dc_plane_state->plane_size.chroma_size.width   = stream->src.width;
-   dc_plane_state->tiling_info.gfx9.swizzle =  DC_SW_UNKNOWN;
dc_plane_state->format = SURFACE_PIXEL_FORMAT_GRPH_ARGB;
dc_plane_state->tiling_info.gfx9.swizzle = DC_SW_UNKNOWN;
dc_plane_state->rotation = ROTATION_ANGLE_0;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index 531f405d2554..3af24ef9cb2d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -2225,14 +2225,10 @@ enum dc_status dcn20_patch_unknown_plane_state(struct 
dc_plane_state *plane_stat
enum surface_pixel_format surf_pix_format = plane_state->format;
unsigned int bpp = resource_pixel_format_to_bpp(surf_pix_format);
 
-   enum swizzle_mode_values swizzle = DC_SW_LINEAR;
-
+   plane_state->tiling_info.gfx9.swizzle = DC_SW_64KB_S;
if (bpp == 64)
-   swizzle = DC_SW_64KB_D;
-   else
-   swizzle = DC_SW_64KB_S;
+   plane_state->tiling_info.gfx9.swizzle = DC_SW_64KB_D;
 
-   plane_state->tiling_info.gfx9.swizzle = swizzle;
return DC_OK;
 }
 
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
index fbcf0afeae0d..8f9244fe5c86 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
@@ -1393,15 +1393,13 @@ static uint32_t read_pipe_fuses(struct dc_context *ctx)
 
 static enum dc_status dcn21_patch_unknown_plane_state(struct dc_plane_state 
*plane_state)
 {
-   enum dc_status result = DC_OK;
-
if (plane_state->ctx->dc->debug.disable_dcc == DCC_ENABLE) {
plane_state->dcc.enable = 1;
/* align to our worst case block width */
plane_state->dcc.meta_pitch = ((plane_state->src_rect.width + 
1023) / 1024) * 1024;
}
-   result = dcn20_patch_unknown_plane_state(plane_state);
-   return result;
+
+   return dcn20_patch_unknown_plane_state(plane_state);
 }
 
 static const struct resource_funcs dcn21_res_pool_funcs = {
-- 
2.39.0



Re: [PATCH 0/4] Update AMD APU IP version docs

2023-01-18 Thread Guilherme G. Piccoli
On 18/01/2023 14:28, Mario Limonciello wrote:
> Guilherme G. Piccoli submitted some patches recently to the mailing list
> related to removing IP version detection code.  As part of the discussion
> it was mentioned that non-obviously some IP blocks are shared between
> multiple APU parts.
> 
> The example given was for launched products Rembrandt and Mendocino VCN,
> but in looking at the existing table I found that we don't actually
> indicate the VCN version for Mendocino.
> 
> So I added this to the table, and then noticed we're missing references
> to some other recently launched products.
> 
> So update the table with all missing launched products as well as products
> launching this year that don't have IP version changes.
> 
> Mario Limonciello (4):
>   Documentation/gpu: Add MP0 version to apu-asic-info-table
>   Documentation/gpu: Update lines for GREEN_SARDINE and YELLOW_CARP
>   Documentation/gpu: Add Mendocino to apu-asic-info-table
>   Documentation/gpu: Add Raphael to apu-asic-info-table
> 
>  .../gpu/amdgpu/apu-asic-info-table.csv | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 

Thanks a lot for the improvement Mario, much appreciated!
Cheers,


Guilherme


Re: [PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect SRAM HW model check

2023-01-17 Thread Guilherme G. Piccoli
On 17/01/2023 15:14, Limonciello, Mario wrote:
> [Public]
> 
> 
> 
>> -Original Message-
>> From: Guilherme G. Piccoli 
>> Sent: Tuesday, January 17, 2023 12:14
>> To: Limonciello, Mario ; amd-
>> g...@lists.freedesktop.org; Deucher, Alexander
>> 
>> Cc: dri-de...@lists.freedesktop.org; Koenig, Christian
>> ; Pan, Xinhui ;
>> ker...@gpiccoli.net; kernel-...@igalia.com; Zhu, James
>> ; Lazar, Lijo ; Liu, Leo
>> ; Jiang, Sonny 
>> Subject: Re: [PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect
>> SRAM HW model check
>>
>> On 17/01/2023 15:08, Limonciello, Mario wrote:
>>> [...]
>>>
>>> Should have added this tag too:
>>> Suggested-by: Alexander Deucher 
>>>
>>> Looks good to me, thanks!
>>> Reviewed-by: Mario Limonciello 
>>>
>>
>> You're totally right, thanks for the reminder and apologies for missing
>> that! Just sending V3 heheh
>>
>> Ah, thanks for the reviews and prompt responses.
>> Cheers,
>>
>>
>> Guilherme
> 
> No need to resend.  Patchwork will embed the tags when we pick this up.

Already did, but thanks again for the info - learning a lot in this
thread =)


[PATCH v3 2/2] drm/amdgpu/vcn: Remove redundant indirect SRAM HW model check

2023-01-17 Thread Guilherme G. Piccoli
The HW model validation that guards the indirect SRAM checking in the
VCN code path is redundant - there's no model that's not included in the
switch, making it useless in practice [0].

So, let's remove this switch statement for good.

[0] 
lore.kernel.org/amd-gfx/mn0pr12mb61013d20b8a2263b22ae1bcfe2...@mn0pr12mb6101.namprd12.prod.outlook.com

Suggested-by: Alex Deucher 
Reviewed-by: Mario Limonciello 
Cc: James Zhu 
Cc: Lazar Lijo 
Cc: Leo Liu 
Cc: Sonny Jiang 
Signed-off-by: Guilherme G. Piccoli 
---


V3:
* Added Mario's review tag and Alex's suggested tag - thanks
for the reminder Mario!

V2:
* Changed the approach after ML discussion- instead of cleaning up
the switch statement, removed it entirely - special thanks to Alex
and Mario for the feedback!


 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 81 +
 1 file changed, 3 insertions(+), 78 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 1b1a3c9e1863..02d428ddf2f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -110,84 +110,9 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
for (i = 0; i < adev->vcn.num_vcn_inst; i++)
atomic_set(>vcn.inst[i].dpg_enc_submission_cnt, 0);
 
-   switch (adev->ip_versions[UVD_HWIP][0]) {
-   case IP_VERSION(1, 0, 0):
-   case IP_VERSION(1, 0, 1):
-   case IP_VERSION(2, 5, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 2, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 6, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 0, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 0, 2):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 0):
-   case IP_VERSION(3, 0, 64):
-   case IP_VERSION(3, 0, 192):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 2):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 16):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 33):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 1, 1):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 1, 2):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(4, 0, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(4, 0, 2):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_s

[PATCH v3 1/2] drm/amdgpu/vcn: Adjust firmware names indentation

2023-01-17 Thread Guilherme G. Piccoli
This is an incredibly trivial fix, just for the sake of
"aesthetical" organization of the defines. Some were space based,
most were tab based and there was a lack of "alignment", now it's
all the same and aligned.

Cc: James Zhu 
Cc: Lazar Lijo 
Cc: Leo Liu 
Cc: Mario Limonciello 
Cc: Sonny Jiang 
Reviewed-by: Alex Deucher 
Signed-off-by: Guilherme G. Piccoli 
---


V2/V3:
* Added Alex's review tag - thanks!


 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 38 -
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index f8397d993f23..1b1a3c9e1863 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -36,26 +36,26 @@
 #include "soc15d.h"
 
 /* Firmware Names */
-#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin"
-#define FIRMWARE_PICASSO   "amdgpu/picasso_vcn.bin"
-#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin"
-#define FIRMWARE_ARCTURUS  "amdgpu/arcturus_vcn.bin"
-#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin"
-#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin"
-#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin"
-#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin"
-#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin"
-#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin"
-#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin"
-#define FIRMWARE_VANGOGH   "amdgpu/vangogh_vcn.bin"
+#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin"
+#define FIRMWARE_PICASSO   "amdgpu/picasso_vcn.bin"
+#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin"
+#define FIRMWARE_ARCTURUS  "amdgpu/arcturus_vcn.bin"
+#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin"
+#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin"
+#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin"
+#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin"
+#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin"
+#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin"
+#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin"
+#define FIRMWARE_VANGOGH   "amdgpu/vangogh_vcn.bin"
 #define FIRMWARE_DIMGREY_CAVEFISH  "amdgpu/dimgrey_cavefish_vcn.bin"
-#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin"
-#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin"
-#define FIRMWARE_YELLOW_CARP   "amdgpu/yellow_carp_vcn.bin"
-#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin"
-#define FIRMWARE_VCN4_0_0  "amdgpu/vcn_4_0_0.bin"
-#define FIRMWARE_VCN4_0_2  "amdgpu/vcn_4_0_2.bin"
-#define FIRMWARE_VCN4_0_4  "amdgpu/vcn_4_0_4.bin"
+#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin"
+#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin"
+#define FIRMWARE_YELLOW_CARP   "amdgpu/yellow_carp_vcn.bin"
+#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin"
+#define FIRMWARE_VCN4_0_0  "amdgpu/vcn_4_0_0.bin"
+#define FIRMWARE_VCN4_0_2  "amdgpu/vcn_4_0_2.bin"
+#define FIRMWARE_VCN4_0_4  "amdgpu/vcn_4_0_4.bin"
 
 MODULE_FIRMWARE(FIRMWARE_RAVEN);
 MODULE_FIRMWARE(FIRMWARE_PICASSO);
-- 
2.39.0



Re: [PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect SRAM HW model check

2023-01-17 Thread Guilherme G. Piccoli
On 17/01/2023 15:08, Limonciello, Mario wrote:
> [...]
> 
> Should have added this tag too:
> Suggested-by: Alexander Deucher 
> 
> Looks good to me, thanks!
> Reviewed-by: Mario Limonciello 
> 

You're totally right, thanks for the reminder and apologies for missing
that! Just sending V3 heheh

Ah, thanks for the reviews and prompt responses.
Cheers,


Guilherme


[PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect SRAM HW model check

2023-01-17 Thread Guilherme G. Piccoli
The HW model validation that guards the indirect SRAM checking in the
VCN code path is redundant - there's no model that's not included in the
switch, making it useless in practice [0].

So, let's remove this switch statement for good.

[0] 
lore.kernel.org/amd-gfx/mn0pr12mb61013d20b8a2263b22ae1bcfe2...@mn0pr12mb6101.namprd12.prod.outlook.com

Cc: James Zhu 
Cc: Lazar Lijo 
Cc: Leo Liu 
Cc: Mario Limonciello 
Cc: Sonny Jiang 
Signed-off-by: Guilherme G. Piccoli 
---


V2:
* Changed the approach after ML discussion- instead of cleaning up
the switch statement, removed it entirely - special thanks to Alex
and Mario for the feedback!

Notice that patch 3 was dropped from this series after reviews.


 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 81 +
 1 file changed, 3 insertions(+), 78 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 1b1a3c9e1863..02d428ddf2f8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -110,84 +110,9 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
for (i = 0; i < adev->vcn.num_vcn_inst; i++)
atomic_set(>vcn.inst[i].dpg_enc_submission_cnt, 0);
 
-   switch (adev->ip_versions[UVD_HWIP][0]) {
-   case IP_VERSION(1, 0, 0):
-   case IP_VERSION(1, 0, 1):
-   case IP_VERSION(2, 5, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 2, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 6, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 0, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 0, 2):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 0):
-   case IP_VERSION(3, 0, 64):
-   case IP_VERSION(3, 0, 192):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 2):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 16):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 33):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 1, 1):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 1, 2):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(4, 0, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(4, 0, 2):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case I

[PATCH v2 1/2] drm/amdgpu/vcn: Adjust firmware names indentation

2023-01-17 Thread Guilherme G. Piccoli
This is an incredibly trivial fix, just for the sake of
"aesthetical" organization of the defines. Some were space based,
most were tab based and there was a lack of "alignment", now it's
all the same and aligned.

Cc: James Zhu 
Cc: Lazar Lijo 
Cc: Leo Liu 
Cc: Mario Limonciello 
Cc: Sonny Jiang 
Reviewed-by: Alex Deucher 
Signed-off-by: Guilherme G. Piccoli 
---


V2:
* Added Alex's review tag - thanks!


 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 38 -
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index f8397d993f23..1b1a3c9e1863 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -36,26 +36,26 @@
 #include "soc15d.h"
 
 /* Firmware Names */
-#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin"
-#define FIRMWARE_PICASSO   "amdgpu/picasso_vcn.bin"
-#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin"
-#define FIRMWARE_ARCTURUS  "amdgpu/arcturus_vcn.bin"
-#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin"
-#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin"
-#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin"
-#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin"
-#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin"
-#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin"
-#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin"
-#define FIRMWARE_VANGOGH   "amdgpu/vangogh_vcn.bin"
+#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin"
+#define FIRMWARE_PICASSO   "amdgpu/picasso_vcn.bin"
+#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin"
+#define FIRMWARE_ARCTURUS  "amdgpu/arcturus_vcn.bin"
+#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin"
+#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin"
+#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin"
+#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin"
+#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin"
+#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin"
+#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin"
+#define FIRMWARE_VANGOGH   "amdgpu/vangogh_vcn.bin"
 #define FIRMWARE_DIMGREY_CAVEFISH  "amdgpu/dimgrey_cavefish_vcn.bin"
-#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin"
-#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin"
-#define FIRMWARE_YELLOW_CARP   "amdgpu/yellow_carp_vcn.bin"
-#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin"
-#define FIRMWARE_VCN4_0_0  "amdgpu/vcn_4_0_0.bin"
-#define FIRMWARE_VCN4_0_2  "amdgpu/vcn_4_0_2.bin"
-#define FIRMWARE_VCN4_0_4  "amdgpu/vcn_4_0_4.bin"
+#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin"
+#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin"
+#define FIRMWARE_YELLOW_CARP   "amdgpu/yellow_carp_vcn.bin"
+#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin"
+#define FIRMWARE_VCN4_0_0  "amdgpu/vcn_4_0_0.bin"
+#define FIRMWARE_VCN4_0_2  "amdgpu/vcn_4_0_2.bin"
+#define FIRMWARE_VCN4_0_4  "amdgpu/vcn_4_0_4.bin"
 
 MODULE_FIRMWARE(FIRMWARE_RAVEN);
 MODULE_FIRMWARE(FIRMWARE_PICASSO);
-- 
2.39.0



Re: [PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode

2023-01-17 Thread Guilherme G. Piccoli
On 17/01/2023 13:24, Limonciello, Mario wrote:
> [...]
>>> Though I see two problems with that: first, I'm not sure what's the
>>> impact in the GPU functioning when I disable some IP block.
>>>
> 
> It depends on the individual block what the impact is.  For example
> if you don't have VCN, then you can't do any accelerated video playback.
> 
>>> Second, the parameter is a bit hard to figure - we need to clear a bit
>>> for the IP block we want to disable, and the doc suggest to read on
>>> dmesg to get this information (it seems it changes depending on the HW
>>> model), but I couldn't parse the proper bit from dmesg. Needed to
>>> instrument the kernel to find the proper bit heh
>>>
> 
> Isn't it this stuff (taken from a CZN system):
> 
> [7.797779] [drm] add ip block number 0 
> [7.797781] [drm] add ip block number 1 
> [7.797782] [drm] add ip block number 2 
> [7.797783] [drm] add ip block number 3 
> [7.797783] [drm] add ip block number 4 
> [7.797784] [drm] add ip block number 5 
> [7.797785] [drm] add ip block number 6 
> [7.797786] [drm] add ip block number 7 
> [7.797787] [drm] add ip block number 8 
> [7.797788] [drm] add ip block number 9 
> 
> So for that system it would be bit 8 to disable vcn.
> 
> In terms of how debugging would work:
> I would expect when you get your failure it will have been the previous
> block # that failed, and so you can reboot with that block masked and
> see if you get further.
> 

Thanks Mario, much appreciated! You're totally right and I messed up not
seeing these obvious messages...

So, I'll just drop the parameter on V2.
Cheers,


Guilherme


Re: [PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode

2023-01-17 Thread Guilherme G. Piccoli
On 16/01/2023 23:33, Limonciello, Mario wrote:
> [...]
> 
> For debugging these type of problems, I think an effective debugging 
> tactic would have been to mask the IP block (amdgpu.ip_block_mask).

Thank you, it worked indeed - nice suggestion!

Though I see two problems with that: first, I'm not sure what's the
impact in the GPU functioning when I disable some IP block.

Second, the parameter is a bit hard to figure - we need to clear a bit
for the IP block we want to disable, and the doc suggest to read on
dmesg to get this information (it seems it changes depending on the HW
model), but I couldn't parse the proper bit from dmesg. Needed to
instrument the kernel to find the proper bit heh

The second part is easy to improve (we can just show this bit in
dmesg!), I might do that instead of proposing this parameter, which
seems didn't raise much excitement after all heheh

Finally, I'm still curious on why Deck was working fine with the
indirect SRAM mode disabled (by mistake) in many kernels - was it in
practice the same as disabling the VCN IP block?

Thanks,


Guilherme



Re: [PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode

2023-01-17 Thread Guilherme G. Piccoli
On 16/01/2023 20:00, Alex Deucher wrote:
> [...]
> 
> It's not clear to me when this would be used.  We only disable it
> briefly during new asic bring up, after that we never touch it again.
> No end user on production hardware should ever have to change it and
> doing so would break VCN on their system.
> 
> Alex

[+ Pierre-Loup]

Steam Deck is facing a pretty weird scenario then heheh

Commit 82132ecc543 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
corrected a long-term issue in which the indirect SRAM mode wasn't
enabled for Vangogh - and Deck GPU architecture is Vangogh, so it was
working perfectly with that disabled.

Happens that a bad FW candidate seems to have broken something - it was
a bit convoluted to debug, but we proved that disabling indirect SRAM is
a good workaround so far, it "restored the behavior" pre-82132ecc543.

Hence my proposal - this parameter would've made life so much easier,
and we're start using it "downstream" now. While I understand that of
course the FW should be fixed, meanwhile this is a cheap solution to
allow further debug and real use of the system.

Thanks,


Guilherme


Re: [PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode

2023-01-16 Thread Guilherme G. Piccoli
On 16/01/2023 19:27, Liu, Leo wrote:
> Secure part requires PSP load VCN boot sequence which is with indirect
> sram mode.
> 
> Regards,
> Leo

With that said, and with the plans of just setting the indirect sram
mode nevertheless (with no switch/cases anymore - I'll submit the V2
tomorrow if everybody agrees), does anybody oppose to have this
parameter for debug reasons?

I first considered doing through debugfs, but obviously was a (very) bad
idea, since this should be an early boot tuning heheh

Cheers,


Guilherme


Re: [PATCH 2/3] drm/amdgpu/vcn: Deduplicate indirect SRAM checking code

2023-01-16 Thread Guilherme G. Piccoli
On 16/01/2023 18:41, Alex Deucher wrote:
> [...]
>> +   case IP_VERSION(4, 0, 0):   /* Vega10 */
> 
> This comment is incorrect.  Vega10 does not have VCN (it has UVD and VCE).
> 
> Alex

Thanks Alex! I'll resubmit V2 with this comment removed.

I've stolen it from
https://gitlab.freedesktop.org/agd5f/linux/-/blob/amd-staging-drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c#L1147
, but obviously I misunderstood the mapping here heheh

Cheers,


Guilherme


Re: [PATCH 2/3] drm/amdgpu/vcn: Deduplicate indirect SRAM checking code

2023-01-16 Thread Guilherme G. Piccoli
On 16/01/2023 18:49, Limonciello, Mario wrote:
> [...] 
> 
> The problem with adding a comment about which platform it's in is that
> this will probably go stale.  Some IP blocks are re-used in multiple ASICs.
> 
> VCN 3, 1, 1 is a great example.  This is used in both Yellow Carp (Rembrandt) 
> as well
> as Mendocino.   I would think a better approach would be to have a single 
> point
> of documentation somewhere in the source tree that we can update after new
> ASICs launch that could act as a decoder ring.
> 
> It's effort to remember to update it, but it's at least a single point of 
> failure.
> 

Hi Mario, thanks - it makes a lot of sense! I can remove the comments in
the V2, I'll wait a bit more for (potential) extra feedback but I agree
with you, so I'm inclined to remove them.

Cheers,


Guilherme


[PATCH 2/3] drm/amdgpu/vcn: Deduplicate indirect SRAM checking code

2023-01-16 Thread Guilherme G. Piccoli
Currently both conditionals checked to enable indirect SRAM are
repeated for all HW/IP models. Let's just simplify it a bit so
code is more compact and readable.

While at it, add the legacy names as a comment per case block, to
help whoever is reading the code and doesn't have much experience
with the name/number mapping.

Cc: James Zhu 
Cc: Lazar Lijo 
Cc: Leo Liu 
Cc: Mario Limonciello 
Cc: Sonny Jiang 
Signed-off-by: Guilherme G. Piccoli 
---


Hi folks, first of all thanks in advance for reviews/comments!

This work is based on agd5f/amd-staging-drm-next branch - there is this
patch from Mario that refactored the amdgpu_vcn.c, and since it dropped
the legacy names from the switch cases, I've decided to also include them
here as comments.

I'm not sure if that's a good idea, feels good for somebody not so
used to the code read the codenames instead of purely numbers, but
if you wanna move away from the legacy names for good, lemme know
and I can rework without these comments.

Cheers,


Guilherme


 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 84 +
 1 file changed, 16 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 1b1a3c9e1863..1f880e162d9d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -111,78 +111,26 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
atomic_set(>vcn.inst[i].dpg_enc_submission_cnt, 0);
 
switch (adev->ip_versions[UVD_HWIP][0]) {
-   case IP_VERSION(1, 0, 0):
-   case IP_VERSION(1, 0, 1):
-   case IP_VERSION(2, 5, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 2, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 6, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 0, 0):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(2, 0, 2):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 0):
-   case IP_VERSION(3, 0, 64):
-   case IP_VERSION(3, 0, 192):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 2):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 16):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 0, 33):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
-   case IP_VERSION(3, 1, 1):
-   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
-   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
-   adev->vcn.indirect_sram = true;
-   break;
+   case IP_VERSION(1, 0, 0):   /* Raven (1/2) / Picasso */
+   case IP_VERSION(1, 0, 1):   /* Raven (1/2) / Picasso */
+   case IP_VERSION(2, 0, 0):   /* Navi10 */
+   case IP_VERSION(2, 0, 2):   /* Navi12 / Navi14 */
+   case IP_VERSION(2, 2, 0):   /* Renoir / Green Sardine */
+   case IP_VERSION(2, 5, 0):   /* Arcturus */
+   case IP_VERSION(2, 6, 0):   /* Aldebaran */
+   case IP_VERSION(3, 0, 0):   /* Sienna Cichlid / Navy Flounder */
+   case IP_VERSION(3, 0, 2):   /* Vangogh */
+   case IP_VERSION(3, 0, 

[PATCH 1/3] drm/amdgpu/vcn: Adjust firmware names indentation

2023-01-16 Thread Guilherme G. Piccoli
This is an incredibly trivial fix, just for the sake of
"aesthetical" organization of the defines. Some were space based,
most were tab based and there was a lack of "alignment", now it's
all the same and aligned.

Cc: James Zhu 
Cc: Lazar Lijo 
Cc: Leo Liu 
Cc: Mario Limonciello 
Cc: Sonny Jiang 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 38 -
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index f8397d993f23..1b1a3c9e1863 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -36,26 +36,26 @@
 #include "soc15d.h"
 
 /* Firmware Names */
-#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin"
-#define FIRMWARE_PICASSO   "amdgpu/picasso_vcn.bin"
-#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin"
-#define FIRMWARE_ARCTURUS  "amdgpu/arcturus_vcn.bin"
-#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin"
-#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin"
-#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin"
-#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin"
-#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin"
-#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin"
-#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin"
-#define FIRMWARE_VANGOGH   "amdgpu/vangogh_vcn.bin"
+#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin"
+#define FIRMWARE_PICASSO   "amdgpu/picasso_vcn.bin"
+#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin"
+#define FIRMWARE_ARCTURUS  "amdgpu/arcturus_vcn.bin"
+#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin"
+#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin"
+#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin"
+#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin"
+#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin"
+#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin"
+#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin"
+#define FIRMWARE_VANGOGH   "amdgpu/vangogh_vcn.bin"
 #define FIRMWARE_DIMGREY_CAVEFISH  "amdgpu/dimgrey_cavefish_vcn.bin"
-#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin"
-#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin"
-#define FIRMWARE_YELLOW_CARP   "amdgpu/yellow_carp_vcn.bin"
-#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin"
-#define FIRMWARE_VCN4_0_0  "amdgpu/vcn_4_0_0.bin"
-#define FIRMWARE_VCN4_0_2  "amdgpu/vcn_4_0_2.bin"
-#define FIRMWARE_VCN4_0_4  "amdgpu/vcn_4_0_4.bin"
+#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin"
+#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin"
+#define FIRMWARE_YELLOW_CARP   "amdgpu/yellow_carp_vcn.bin"
+#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin"
+#define FIRMWARE_VCN4_0_0  "amdgpu/vcn_4_0_0.bin"
+#define FIRMWARE_VCN4_0_2  "amdgpu/vcn_4_0_2.bin"
+#define FIRMWARE_VCN4_0_4  "amdgpu/vcn_4_0_4.bin"
 
 MODULE_FIRMWARE(FIRMWARE_RAVEN);
 MODULE_FIRMWARE(FIRMWARE_PICASSO);
-- 
2.39.0



[PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode

2023-01-16 Thread Guilherme G. Piccoli
Currently the FW loading path perform some checks based on IP model
and in case it is advertised as supported, the VCN indirect SRAM
mode is used.

Happens that in case there's any issue on FW and this mode ends-up
not being properly supported, the driver probe fails [0]. Debugging
this requires driver rebuilding, so to allow fast debug and experiments,
add a parameter to force setting indirect SRAM mode to true/false from
the kernel command-line; parameter default is -1, which doesn't change
the current driver's behavior.

[0] Example of this issue, observed on Steam Deck:

[drm] kiq ring mec 2 pipe 1 q 0
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x)
amdgpu :04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring 
vcn_dec_0 test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block  
failed -110
amdgpu :04:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu :04:00.0: amdgpu: Fatal error during GPU init

Cc: James Zhu 
Cc: Lazar Lijo 
Cc: Leo Liu 
Cc: Mario Limonciello 
Cc: Sonny Jiang 
Signed-off-by: Guilherme G. Piccoli 
---


This work is based on agd5f/amd-staging-drm-next branch.
Thanks in advance for reviews/comments!
Cheers,

Guilherme


 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 872450a3a164..5d3c92c94f18 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -215,6 +215,7 @@ extern int amdgpu_noretry;
 extern int amdgpu_force_asic_type;
 extern int amdgpu_smartshift_bias;
 extern int amdgpu_use_xgmi_p2p;
+extern int amdgpu_indirect_sram;
 #ifdef CONFIG_HSA_AMD
 extern int sched_policy;
 extern bool debug_evictions;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 06aba201d4db..c7182c0bc841 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -187,6 +187,7 @@ int amdgpu_num_kcq = -1;
 int amdgpu_smartshift_bias;
 int amdgpu_use_xgmi_p2p = 1;
 int amdgpu_vcnfw_log;
+int amdgpu_indirect_sram = -1;
 
 static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work);
 
@@ -941,6 +942,14 @@ MODULE_PARM_DESC(smu_pptable_id,
"specify pptable id to be used (-1 = auto(default) value, 0 = use 
pptable from vbios, > 0 = soft pptable id)");
 module_param_named(smu_pptable_id, amdgpu_smu_pptable_id, int, 0444);
 
+/**
+ * DOC: indirect_sram (int)
+ * Allow users to force using (or not) the VCN indirect SRAM mode in the fw 
load
+ * code. Default is -1, meaning auto (aka, don't mess with driver's behavior).
+ */
+MODULE_PARM_DESC(indirect_sram, "Force VCN indirect SRAM (-1 = auto (default), 
0 = disabled, 1 = enabled)");
+module_param_named(indirect_sram, amdgpu_indirect_sram, int, 0444);
+
 /* These devices are not supported by amdgpu.
  * They are supported by the mach64, r128, radeon drivers
  */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 1f880e162d9d..a2290087e01c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -137,6 +137,9 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev)
return -EINVAL;
}
 
+   if (amdgpu_indirect_sram >= 0)
+   adev->vcn.indirect_sram = (bool)amdgpu_indirect_sram;
+
hdr = (const struct common_firmware_header *)adev->vcn.fw->data;
adev->vcn.fw_version = le32_to_cpu(hdr->ucode_version);
 
-- 
2.39.0



[PATCH] drm/amd/display: Fix warning caused by a bad fake property

2023-01-15 Thread Guilherme G. Piccoli
Commit 5468c36d6285 ("drm/amd/display: Filter Invalid 420 Modes for HDMI TMDS")
added a validation on driver probe to prevent invalid TMDS modes. For that,
it requires to fake some plane properties before submitting to the validation
routines.

Happens that one of such fake properties (swizzle) causes a boot-time
warning due to an assert on swizzle routines. Since it's a fake property,
change it hereby to a valid value to prevent such warning. Also, while at
it, fix the unnecessary double assignment of this same property in code.

Fixes: 5468c36d6285 ("drm/amd/display: Filter Invalid 420 Modes for HDMI TMDS")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2264
Reported-by: Melissa Wen 
Cc: Fangzhi Zuo 
Cc: Harry Wentland 
Cc: Leo Li 
Cc: Mark Broadworth 
Cc: Rodrigo Siqueira 
Cc: Sung Joon Kim 
Signed-off-by: Guilherme G. Piccoli 
---


Hi folks, notice my choice here was to set swizzle to DC_SW_LINEAR;
I've considered creating a new enum member (like "DC_SW_FAKE", or
something like that) but seems it's easier to just use the LINEAR
one. In my (very cheap!) understanding, this shouldn't affect the
purpose of the TMDS validation thing.

Lemme know if there's something to improve here, and thanks in
advance for reviews/comments.
Cheers,


Guilherme


 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 1b7f20a9d4ae..35ab39fd9345 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -6261,9 +6261,8 @@ static enum dc_status 
dm_validate_stream_and_context(struct dc *dc,
dc_plane_state->plane_size.surface_size.width  = stream->src.width;
dc_plane_state->plane_size.chroma_size.height  = stream->src.height;
dc_plane_state->plane_size.chroma_size.width   = stream->src.width;
-   dc_plane_state->tiling_info.gfx9.swizzle =  DC_SW_UNKNOWN;
dc_plane_state->format = SURFACE_PIXEL_FORMAT_GRPH_ARGB;
-   dc_plane_state->tiling_info.gfx9.swizzle = DC_SW_UNKNOWN;
+   dc_plane_state->tiling_info.gfx9.swizzle = DC_SW_LINEAR;
dc_plane_state->rotation = ROTATION_ANGLE_0;
dc_plane_state->is_tiling_rotated = false;
dc_plane_state->tiling_info.gfx8.array_mode = DC_ARRAY_LINEAR_GENERAL;
-- 
2.39.0



Re: [PATCH 2/3] drm/amdgpu: stop freeing PSP buffers during suspend

2022-11-17 Thread Guilherme G. Piccoli
Hi Christian, thanks for the fix! It worked fine on Steam Deck running
the game Cuphead, feel free to add my:

Tested-by: Guilherme G. Piccoli 




Re: [PATCH 3/3] drm/amdgpu: WARN when freeing kernel memory during suspend

2022-11-17 Thread Guilherme G. Piccoli
Hey Christian, very interesting idea! I've tested it alone (without
patch 2) and was able to see the PSP code freeing buffers during
suspend, leading to a resume failure later.

Feel free to add my:
Tested-by: Guilherme G. Piccoli 

Cheers,


Guilherme


Re: [PATCH] drm/amdgpu/psp: don't free PSP buffers on suspend

2022-11-16 Thread Guilherme G. Piccoli
Thanks for the fix, feel free to add my:

Tested-by: Guilherme G. Piccoli 


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Guilherme G. Piccoli
On 10/12/2021 11:16, Alex Deucher wrote:> [...]
> Why not just reload the driver after kexec?
> 
> Alex

Because the original issue is the kdump case, and we want a very very
tiny kernel - also, the crash originally could have been caused by
amdgpu itself, so if it's a GPU issue, we don't want to mess with that
in kdump. And I confess I tried modprobe amdgpu after a kdump, no
success - kdump won't call shutdown handlers, so GPU will be in a
"rogue" state...

My question was about regular kexec because it's much simpler usually,
we can do whatever we want there. My line of thought was: if I make it
work in regular kexec with a simple framebuffer, I might be able to get
it working on kdump heheh




Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Guilherme G. Piccoli
On 10/12/2021 12:13, Christian König wrote:
> [...]
> How about issuing a PCIe reset and re-initializing the ASIC with just 
> the VBIOS?
> 
> That should be pretty straightforward I think.
> 
> Christian.


Thanks Christian, that'd be perfect! Is it feasible? Per Alex comment,
we'd need to run atombios commands to reprogram the timings, display
info, etc...like a small driver would do, a full init.

Also, what kind of PCIe reset is recommended for this adapter? Like a
hot reset, powering-off/re-power, FLR or that MODE2 reset present in
amdgpu code? Remembering this is an APU device.

Thanks a lot!



Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-10 Thread Guilherme G. Piccoli
Thanks a lot Alex / Gerd and Thomas, very informative stuff! I'm glad
there are projects to collect/save the data and reuse after a kdump,
this is very useful.

I'll continue my study on the atombios thing of AMD and QXL, maybe at
least we can make it work in qemu, that'd be great (like a small
initdriver to reprogram de paravirtual device on kexec boot).

Cheers,


Guilherme


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-09 Thread Guilherme G. Piccoli
Thanks again Alex! Some comments inlined below:

On 09/12/2021 15:06, Alex Deucher wrote:
> Not really in a generic way.  It's asic and platform specific.  In
> addition most modern displays require link training to bring up the
> display, so you can't just save and restore registers.

Oh sure, I understand that. My question is more like: is there a way,
inside amdgpu driver, to save this state before taking
over/overwriting/reprogramming the device? So we could (again, from
inside the amdgpu driver) dump this pre-saved state in the shutdown
handler, for example, having the device in a "pre-OS" state when the new
kexec'ed kernel starts.

> 
> The drivers are asic and platform specific.  E.g., the driver for
> vangogh is different from renoir is different from skylake, etc.  The
> display programming interfaces are asic specific.

Cool, that makes sense! But if you (or anybody here) know some of these
GOP drivers, e.g. for the qemu/qxl device, I'm just curious to
see/understand how complex is the FW driver to just put the
device/screen in a usable state.

Cheers,


Guilherme


Re: Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-09 Thread Guilherme G. Piccoli
On 09/12/2021 14:31, Alex Deucher wrote:
> [...] 
> Once the driver takes over, none of the pre-driver state is retained.
> You'll need to load the driver in the new kernel to initialize the
> displays.  Note the efifb doesn't actually have the ability to program
> any hardware, it just takes over the memory region that was used for
> the pre-OS framebuffer and whatever display timing was set up by the
> GOP driver prior to the OS loading.  Once that OS driver has loaded
> the area is gone and the display configuration may have changed.
> 

Hi Christian and Alex, thanks for the clarifications!

Is there any way to save/retain this state before amdgpu takes over?
Would simpledrm be able to program the device again, to a working state?

Finally, do you have any example of such a GOP driver (open source) so I
can take a look? I tried to find something like that in Tianocore
project, but didn't find anything that seemed useful for my issue.

Thanks again!


Reuse framebuffer after a kexec (amdgpu / efifb)

2021-12-09 Thread Guilherme G. Piccoli
Hi all, I have a question about the possibility of reusing a framebuffer
after a regular (or panic) kexec - my case is with amdgpu (APU, aka, not
a separate GPU hardware), but I guess the question is kinda generic
hence I've looped most of the lists / people I think does make sense
(apologies for duplicates).


The context is: we have a hardware that has an amdgpu-controlled device
(Vangogh model) and as soon as the machine boots, efifb is providing
graphics - I understand the UEFI/GRUB outputs rely in EFI framebuffer as
well. As soon amdgpu module is available, kernel loads it and it takes
over the GPU, providing graphics. The kexec_file_load syscall allows to
pass a valid screen_info structure, so by kexec'ing a new kernel, we
have again efifb taking over on boot time, but this time I see nothing
in the screen. I've manually blacklisted amdgpu in this new kexec'ed
kernel, I'd like to rely in the simple framebuffer - the goal is to have
a tiny kernel kexec'ed. I'm using kernel version 5.16.0-rc4.

I've done some other experiments, for exemple: I've forced screen_info
model to match VLFB, so vesafb took over after the kexec, with the same
result. Also noticed that BusMaster bit was off after kexec, in the AMD
APU PCIe device, so I've set it on efifb before probe, and finally
tested the same things in qemu, with qxl, all with the same result
(blank screen).
The most interesting result I got (both with amdgpu and qemu/qxl) is
that if I blacklist these drivers and let the machine continue using
efifb since the beginning, after kexec the efifb is still able to
produce graphics.

Which then led me to think that likely there's something fundamentally
"blocking" the reuse of the simple framebuffer after kexec, like maybe
DRM stack is destroying the old framebuffer somehow? What kind of
preparation is required at firmware level to make the simple EFI VGA
framebuffer work, and could we perform this in a kexec (or "save it"
before the amdgpu/qxl drivers take over and reuse later)?

Any advice is greatly appreciated!
Thanks in advance,


Guilherme