Re: [PATCH] Revert "drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+"
On 12/07/2023 11:47, Pillai, Aurabindo wrote: > Hi Guilherme, > > Sorry there was one more patch which I missed to attach. Please add this > 3^rd patch and retry. > > Reverting that patch would cause high power consumption on Navi2x GPU > also cause hangs on certain multi monitor configurations. With these 3 > patches, you're getting the same effect as reverting the aforementioned > patches, but it makes the reverted sequence available only for Steam > deck hardware. > Thanks a lot for your detailed explanation, and the 3rd patch! Indeed, amdgpu works fine on Deck with that - great =) Feel free to add my: Tested-by: Guilherme G. Piccoli #Steam Deck Oh, a fixes tag would also makes sense, I guess. BTW, if possible to submit the 3 patches in a proper series to get it merged on 6.5-rc cycle (the sooner the better), I'd really appreciate! Cheers, Guilherme
Re: [PATCH] Revert "drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+"
On 11/07/2023 15:22, Aurabindo Pillai wrote: > [...] > Hi, > > Sorry for the delayed response, this patch went unnoticed. This revert would > break asics. Could you try the attached patch without reverting this one ? Hi Aurabindo, thanks for your response! I've tried kernel 6.5-rc1, and it seems the issue is present, due to the patch being merged on Linus tree [as 1598fc576420 ("drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+")]. Then, I tried both your attached patches on top of that, and unfortunately, the behavior is the same: Steam Deck doesn't boot with graphics, and we can see the single error "amdgpu :04:00.0: [drm] *ERROR* [CRTC:67:crtc-0] flip_done timed out" on dmesg. Do you / Alex think we could get this revert for 6.5-rc2, so at least we could boot mainline there while the issue is handled? It would be an intermediate fix. You mentioned it breaks some asics, but did they work until now, without your patch? Thanks, Guilherme
[PATCH] Revert "drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+"
This reverts commit 06c3a652a787efc960af7c8816036d25c4227c6c. After this commit, the Steam Deck cannot boot with graphics anymore; the following message is observed on dmesg: "[drm] ERROR [CRTC:67:crtc-0] flip_done timed out" No other error is observed, it just stays like that. After bisecting amd-staging-drm-next, we narrowed it down to this commit. Seems it makes sense to revert it to have the tree bootable until a proper solution is worked. Cc: Aurabindo Pillai Cc: André Almeida Cc: Melissa Wen Cc: Rodrigo Siqueira Signed-off-by: Guilherme G. Piccoli --- Hi Alex / Aurabindo, we couldn't boot the Deck with in HEAD (amd-staging-drm-next), git bisect led to this commit. Since its description already mentions a potential proper solution, related to the DMCUB (and some complex state tracking), I thought it was more effective to revert it to allow booting the tree in Deck (and maybe other HW - I just tested the Deck BTW). Lemme know your thoughts. Special thanks to André and Melissa for helping the debug / bisect! We're open to test alternative patches, feel free to ping. Cheers, Guilherme drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c | 15 --- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_optc.c | 10 -- 2 files changed, 12 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c index 0e8f4f36c87c..27419cd98264 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c @@ -945,10 +945,19 @@ void optc1_set_drr( OTG_FORCE_LOCK_ON_EVENT, 0, OTG_SET_V_TOTAL_MIN_MASK_EN, 0, OTG_SET_V_TOTAL_MIN_MASK, 0); - } - // Setup manual flow control for EOF via TRIG_A - optc->funcs->setup_manual_trigger(optc); + // Setup manual flow control for EOF via TRIG_A + optc->funcs->setup_manual_trigger(optc); + + } else { + REG_UPDATE_4(OTG_V_TOTAL_CONTROL, + OTG_SET_V_TOTAL_MIN_MASK, 0, + OTG_V_TOTAL_MIN_SEL, 0, + OTG_V_TOTAL_MAX_SEL, 0, + OTG_FORCE_LOCK_ON_EVENT, 0); + + optc->funcs->set_vtotal_min_max(optc, 0, 0); + } } void optc1_set_vtotal_min_max(struct timing_generator *optc, int vtotal_min, int vtotal_max) diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_optc.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_optc.c index 58bdbd859bf9..d6f095b4555d 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_optc.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_optc.c @@ -462,16 +462,6 @@ void optc2_setup_manual_trigger(struct timing_generator *optc) { struct optc *optc1 = DCN10TG_FROM_TG(optc); - /* Set the min/max selectors unconditionally so that -* DMCUB fw may change OTG timings when necessary -* TODO: Remove the w/a after fixing the issue in DMCUB firmware -*/ - REG_UPDATE_4(OTG_V_TOTAL_CONTROL, -OTG_V_TOTAL_MIN_SEL, 1, -OTG_V_TOTAL_MAX_SEL, 1, -OTG_FORCE_LOCK_ON_EVENT, 0, -OTG_SET_V_TOTAL_MIN_MASK, (1 << 1)); /* TRIGA */ - REG_SET_8(OTG_TRIGA_CNTL, 0, OTG_TRIGA_SOURCE_SELECT, 21, OTG_TRIGA_SOURCE_PIPE_SELECT, optc->inst, -- 2.40.1
Re: [PATCH 2/2] drm/amdgpu/gfx11: Adjust gfxoff before powergating on gfx11 as well
On 12/05/2023 10:29, Alex Deucher wrote: > [...] >> BTW, I couldn't test this one since I don't have GFX11 HW, so appreciate >> if anyone from AMD (or any community member) could give it a try... > > We've tested this on gfx11 and applied the patches. Thanks! > > Alex > That's great, thanks a lot Alex!
Re: [PATCH 2/2] drm/amdgpu/gfx11: Adjust gfxoff before powergating on gfx11 as well
On 09/05/2023 13:49, Bas Nieuwenhuizen wrote: > From: "Guilherme G. Piccoli" > > (Bas: speculative change to mirror gfx10/gfx9) > > Signed-off-by: Guilherme G. Piccoli > Cc: Alex Deucher > --- Thanks a lot for both patches Bas! This second one, despite I've attached it on gitlab, you should also Sign-off since you're the one sending it to the ML (I guess heh) BTW, I couldn't test this one since I don't have GFX11 HW, so appreciate if anyone from AMD (or any community member) could give it a try... Cheers, Guilherme
Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
On 20/04/2023 12:56, gre...@linuxfoundation.org wrote: > [...] > That's 3000+ emails ago for me :) /head_exploding this is > 1000 emails per day, wow...my sympathies to you heh > [...] >> tl;dr: the offender is present on 6.1.y, but this fix is not, hence I'm >> hereby requesting the merge. Some backport/context adjustment was >> necessary and it was properly tested in the Steam Deck. > > Ok, we'll queue it up soon, thanks. > > greg k-h Thanks =)
Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
On 20/04/2023 12:02, gre...@linuxfoundation.org wrote: >> [...] >>> Which "one" are you referring to here? >>> >>> confused, >>> >>> greg k-h >> >> This one, sent in this email thread. > > I don't have "this email thread" anymore, remember, some of us get > thousand+ emails a day... I don't really understand the issue to be honest, we are talking in the very email thread! The email was sent April/18, it's not old or anything. But in any case, for reference, this is the original email from the lore archives: https://lore.kernel.org/stable/20230418221522.1287942-1-gpicc...@igalia.com/ > >> The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on >> Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream >> hash(es) is 542a56e8eb44 heh > > But that commit says it fixes a problem in the 6.2 tree, why is this > relevant for 6.1.y? > That is explained in the email and the very reason for that, is the duplicate hashes we are discussing here. The fix commit in question points the "Fixes:" tag to 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode"), which appears to be in 6.2 tree, right? But notice that 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode") is the *same* offender and..is present on 6.1 ! In other words, when I first wrote this fix, I just checked the tree quickly and came up with "Fixes: 82132ecc5432", but to be thorough, I should have pointed the fixes tag to 9a8cc8cabc1e, to pick it on 6.1.y. tl;dr: the offender is present on 6.1.y, but this fix is not, hence I'm hereby requesting the merge. Some backport/context adjustment was necessary and it was properly tested in the Steam Deck. Thanks, Guilherme
Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
On 20/04/2023 09:42, gre...@linuxfoundation.org wrote: > [...] >> @Greg, can you pick this one? Thanks! > > Which "one" are you referring to here? > > confused, > > greg k-h This one, sent in this email thread. The title of the patch is "drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes", target is 6.1.y and (one of the) upstream hash(es) is 542a56e8eb44 heh Cheers, Guilherme
Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
On 19/04/2023 17:04, Deucher, Alexander wrote: > [...] >> So, let me check if I understood properly: there are 2 trees (-fixes and >> -next), >> and the problem is that their merge onto mainline happens apart and there >> are kind of duplicate commits, that were first merged on -fixes, then "re- >> merged" on -next, right? >> > > Each subsystem works on new features, then when the merge window opens, Linus > pulls them into mainline. At that point, mainline goes into RCs and then > mainline is bug fixes only. Meanwhile in parallel, each subsystem is working > on new feature development for the next merge window (subsystem -next trees). > The trees tend to lag behind mainline a bit. Most developers focus on them > in support of upcoming features. They are usually also where bugs are fixed. > If a bug is fixed in the -next tree, what's the best way to get it into > mainline? I guess ideally it would be fixed in mainline and them mainline > would be merged into everyone's -next branch, but that's not always > practical. Often times new features depend on bug fixes and then you'd end > up stalling new development waiting for a back merge, or you'd have to rebase > your -next branches regularly so that they would shed any patches in mainline > which is not great either. We try and cherry-pick -x when we can to show the > relationship. > >> Would be possible to clean the -next tree right before the merge, using >> some script to find commits there that are going to be merged but are >> already present? Then you'd have a -next-to-merge tree that is clean and >> doesn't present dups, avoiding this. > > There's no real way to clean it without rewriting history. You'd basically > need to do regular backmerges and rebases of the -next trees. Also then what > do you do if you already have a feature in -next (say Dave or Daniel have > already pulled in my latest amdgpu PR for -next) and you realize that one of > those patches is an important bug fix for mainline? If the drm -next tree > rebased then all of the other drm driver trees that feed into it would need > to rebase as well otherwise they'd have stale history. > > You also have the case of a fix in -next needing to land in mainline, but due > to differences in the trees, it needs backporting to apply properly. > >> >> Disclaimer: I'm not a maintainer, maybe there are smart ways of doing that or >> perhaps my suggestion is silly and unfeasible heh But seems likely that other >> maintainers face this problem as well and came up with some solution - >> avoiding the dups would be a good thing, IMO, if possible. > > > No worries. I agree. I haven't seen anything so far that really addresses > handles this effectively. > > Alex Thanks a bunch Alex, I appreciate your time detailing the issue, which seems...quite complex and annoying, indeed ={ What I'll do from now on is trying to check all hashes that match a commit, so I can add more than one fixes tag if that makes sense. At least, this way I can prevent missing fixes in stable like this patch heh @Greg, can you pick this one? Thanks! Cheers, Guilherme
Re: [PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
On 19/04/2023 10:16, Deucher, Alexander wrote: > [...] >> This is quite strange for me, we have 2 commit hashes pointing to the >> *same* commit, and each one is present..in a different release !!?! >> Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable >> misses it, since it only contains 9a8cc8cabc1e (which is the same patch!). >> >> Alex, do you have an idea why sometimes commits from the AMD tree >> appear duplicate in mainline? Specially in different releases, this could >> cause >> some confusion I guess. > > This is pretty common in the drm. The problem is there is not a good way to > get bug fixes into both -next and -fixes without constant back merges. So > the patches end up in -next and if they are important for stable, they go to > -fixes too. We don't want -next to be broken, but we can't wait until the > next kernel cycle to get the bug fixes into the current kernel and stable. I > don't know of a good way to make this smoother. > > Alex Thanks Alex, seems quite complicated indeed. So, let me check if I understood properly: there are 2 trees (-fixes and -next), and the problem is that their merge onto mainline happens apart and there are kind of duplicate commits, that were first merged on -fixes, then "re-merged" on -next, right? Would be possible to clean the -next tree right before the merge, using some script to find commits there that are going to be merged but are already present? Then you'd have a -next-to-merge tree that is clean and doesn't present dups, avoiding this. Disclaimer: I'm not a maintainer, maybe there are smart ways of doing that or perhaps my suggestion is silly and unfeasible heh But seems likely that other maintainers face this problem as well and came up with some solution - avoiding the dups would be a good thing, IMO, if possible. Cheers, Guilherme
[PATCH 6.1.y] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
commit 542a56e8eb4467ae654eefab31ff194569db39cd upstream. The VCN firmware loading path enables the indirect SRAM mode if it's advertised as supported. We might have some cases of FW issues that prevents this mode to working properly though, ending-up in a failed probe. An example below, observed in the Steam Deck: [...] [drm] failed to load ucode VCN0_RAM(0x3A) [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x) amdgpu :04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110) [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block failed -110 amdgpu :04:00.0: amdgpu: amdgpu_device_ip_init failed amdgpu :04:00.0: amdgpu: Fatal error during GPU init [...] Disabling the VCN block circumvents this, but it's a very invasive workaround that turns off the entire feature. So, let's add a quirk on VCN loading that checks for known problematic BIOSes on Vangogh, so we can proactively disable the indirect SRAM mode and allow the HW proper probe and VCN IP block to work fine. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385 Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode") Fixes: 9a8cc8cabc1e ("drm/amdgpu: enable Vangogh VCN indirect sram mode") Cc: sta...@vger.kernel.org Cc: James Zhu Cc: Leo Liu Signed-off-by: Guilherme G. Piccoli Signed-off-by: Alex Deucher --- Hi folks, this was build/boot tested on Deck. I've also adjusted the context, function was reworked on 6.2. But what a surprise was for me not see this fix already in 6.1.y, since I've CCed stable, and the reason for that is really peculiar: $ git log -1 --pretty="%an <%ae>: %s" 82132ecc5432 Leo Liu : drm/amdgpu: enable Vangogh VCN indirect sram mode $ git describe --contains 82132ecc5432 v6.2-rc1~124^2~1^2~13 $ git log -1 --pretty="%an <%ae>: %s" 9a8cc8cabc1e Leo Liu : drm/amdgpu: enable Vangogh VCN indirect sram mode $ git describe --contains 9a8cc8cabc1e v6.1-rc8~16^2^2 This is quite strange for me, we have 2 commit hashes pointing to the *same* commit, and each one is present..in a different release !!?! Since I've marked this patch as fixing 82132ecc5432 originally, 6.1.y stable misses it, since it only contains 9a8cc8cabc1e (which is the same patch!). Alex, do you have an idea why sometimes commits from the AMD tree appear duplicate in mainline? Specially in different releases, this could cause some confusion I guess. Thanks in advance, Guilherme drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index ce64ca1c6e66..5c1193dd7d88 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -26,6 +26,7 @@ #include #include +#include #include #include #include @@ -84,6 +85,7 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) { unsigned long bo_size; const char *fw_name; + const char *bios_ver; const struct common_firmware_header *hdr; unsigned char fw_check; unsigned int fw_shared_size, log_offset; @@ -159,6 +161,21 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) adev->vcn.indirect_sram = true; + /* +* Some Steam Deck's BIOS versions are incompatible with the +* indirect SRAM mode, leading to amdgpu being unable to get +* properly probed (and even potentially crashing the kernel). +* Hence, check for these versions here - notice this is +* restricted to Vangogh (Deck's APU). +*/ + bios_ver = dmi_get_system_info(DMI_BIOS_VERSION); + + if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) || +!strncmp("F7A0114", bios_ver, 7))) { + adev->vcn.indirect_sram = false; + dev_info(adev->dev, + "Steam Deck quirk: indirect SRAM disabled on BIOS %s\n", bios_ver); + } break; case IP_VERSION(3, 0, 16): fw_name = FIRMWARE_DIMGREY_CAVEFISH; -- 2.40.0
[PATCH] drm/amd/pm: Fix incorrect comment about Vangogh power cap support
The comment mentions that power1 cap attributes are not supported on Vangogh, but the opposite is indeed valid: for APUs, only Vangogh is supported. While at it, also fixed the Renoir comment below (thanks Melissa for noticing that!). Cc: Lijo Lazar Cc: Melissa Wen Signed-off-by: Guilherme G. Piccoli --- Hi folks, this is a very trivial one, just to fix the comments - since I needed to re-read the code/comments to figure out, I'm trying to improve it here to avoid others to get confused as well heh Lemme know in case I made some mistake (even after re-re-reading the code). Cheers, Guilherme drivers/gpu/drm/amd/pm/amdgpu_pm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c index d75a67cfe523..6460906c354c 100644 --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c @@ -3351,7 +3351,7 @@ static umode_t hwmon_attributes_visible(struct kobject *kobj, attr == _dev_attr_pwm1_enable.dev_attr.attr)) /* can't manage state */ effective_mode &= ~S_IWUSR; - /* not implemented yet for GC 10.3.1 APUs */ + /* In the case of APUs, this is only implemented on Vangogh */ if (((adev->family == AMDGPU_FAMILY_SI) || ((adev->flags & AMD_IS_APU) && (gc_ver != IP_VERSION(10, 3, 1 && (attr == _dev_attr_power1_cap_max.dev_attr.attr || @@ -3360,7 +3360,7 @@ static umode_t hwmon_attributes_visible(struct kobject *kobj, attr == _dev_attr_power1_cap_default.dev_attr.attr)) return 0; - /* not implemented yet for APUs having <= GC 9.3.0 */ + /* not implemented yet for APUs having < GC 9.3.0 (Renoir) */ if (((adev->family == AMDGPU_FAMILY_SI) || ((adev->flags & AMD_IS_APU) && (gc_ver < IP_VERSION(9, 3, 0 && (attr == _dev_attr_power1_average.dev_attr.attr)) -- 2.40.0
Re: [Resend PATCH v1 0/3] send message to pmfw when SMT changes
On 28/03/2023 03:07, Yang, WenYou wrote: > [AMD Official Use Only - General] > [...] >> Hi Wenyou, thank you for the clarification and for the interesting patch set! >> >> So, just so I can understand: is it expected that gamers disable SMT? I heard >> some games got their performance improved, but not sure the reason...if you >> have thoughts on that, I'm pretty interested! > Hi Guilherme, > > No, it not. It is not to disable SMT. > > Yes, there is a commit to get performance improved. > https://github.com/torvalds/linux/commit/a8fb40966f19ff81520d9ccf8f7e2b95201368b8 > > Best Regards, > Wenyou Thanks, this one is present in Deck's kernel for a while. Cheers, Guilherme
Re: [Resend PATCH v1 0/3] send message to pmfw when SMT changes
On 26/03/2023 23:49, Yang, WenYou wrote: > Hi Guilherme, > > Thank you for your attention on the patch set. > The purpose of the patch set is to improve the performance when playing the > game. > > Best Regards, > Wenyou Hi Wenyou, thank you for the clarification and for the interesting patch set! So, just so I can understand: is it expected that gamers disable SMT? I heard some games got their performance improved, but not sure the reason...if you have thoughts on that, I'm pretty interested! Cheers, Guilherme
Re: [Resend PATCH v1 0/3] send message to pmfw when SMT changes
Hi Wenyou Yang, first of all thanks for the improvement! I'd like to ask you (and all CCed) if it would be possible to explain a bit the goal / functionality behind these patches. By reading the commit descriptions and code, I can understand code-wise what's going on and how this will message the FW on SMT changes. What I couldn't parse is the purpose of this, or in other words, what does it gain for us? Also, why only on Vangogh? Since I don't have the spec I couldn't read and learn myself - apologies if this is somewhat a silly question. Also, if for some reason you cannot respond (like a HW "NDA"), it's fine too. Thanks in advance, Guilherme
[PATCH] drm/amdgpu/vcn: Disable indirect SRAM on Vangogh broken BIOSes
The VCN firmware loading path enables the indirect SRAM mode if it's advertised as supported. We might have some cases of FW issues that prevents this mode to working properly though, ending-up in a failed probe. An example below, observed in the Steam Deck: [...] [drm] failed to load ucode VCN0_RAM(0x3A) [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x) amdgpu :04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110) [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block failed -110 amdgpu :04:00.0: amdgpu: amdgpu_device_ip_init failed amdgpu :04:00.0: amdgpu: Fatal error during GPU init [...] Disabling the VCN block circumvents this, but it's a very invasive workaround that turns off the entire feature. So, let's add a quirk on VCN loading that checks for known problematic BIOSes on Vangogh, so we can proactively disable the indirect SRAM mode and allow the HW proper probe and VCN IP block to work fine. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385 Fixes: 82132ecc5432 ("drm/amdgpu: enable Vangogh VCN indirect sram mode") Cc: sta...@vger.kernel.org Cc: James Zhu Cc: Leo Liu Signed-off-by: Guilherme G. Piccoli --- Hi folks, based on the feedback from the gitlab issue, here is the upstream attempt to quirk the Steam Deck's BIOSes having known issues with the indirect SRAM mode. I've tested it on both the quirked BIOSes, and also with some working ones. This patch is based on agd5f/amd-staging-drm-next. Thanks in advance for reviews! Cheers, Guilherme drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index 02d428ddf2f8..dc4f3f4cb644 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -26,6 +26,7 @@ #include #include +#include #include #include #include @@ -114,6 +115,24 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) adev->vcn.indirect_sram = true; + /* +* Some Steam Deck's BIOS versions are incompatible with the +* indirect SRAM mode, leading to amdgpu being unable to get +* properly probed (and even potentially crashing the kernel). +* Hence, check for these versions here - notice this is +* restricted to Vangogh (Deck's APU). +*/ + if (adev->ip_versions[UVD_HWIP][0] == IP_VERSION(3, 0, 2)) { + const char *bios_ver = dmi_get_system_info(DMI_BIOS_VERSION); + + if (bios_ver && (!strncmp("F7A0113", bios_ver, 7) || +!strncmp("F7A0114", bios_ver, 7))) { + adev->vcn.indirect_sram = false; + dev_info(adev->dev, + "Steam Deck quirk: indirect SRAM disabled on BIOS %s\n", bios_ver); + } + } + hdr = (const struct common_firmware_header *)adev->vcn.fw->data; adev->vcn.fw_version = le32_to_cpu(hdr->ucode_version); -- 2.39.2
Re: [PATCH v3] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
On 02/02/2023 13:12, Luben Tuikov wrote: > Hi Guilherme, > > Thanks for redoing to a v3. This patch is: > > Reviewed-by: Luben Tuikov > > Regards, > Luben > Thank you for the reviews Luben, much appreciated!
Re: [PATCH v2] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
On 02/02/2023 08:58, Christian König wrote: > [...] >> +if (!ring->no_scheduler && ring->sched.ops) >> drm_sched_fini(>sched); > > I think we should drop the check for no_scheduler here and just call > drm_sched_fini() when the scheduler instance was initialized before. > > Background is that I've found a couple of places where we potentially > set no_scheduler to false after the scheduler was already initialized. > > This is then probably leading to a memory leak or worth. > > Regards, > Christian. Thanks Christian, nice finding! And thanks for the reviews Guchun / Luben =) I just submitted a V3 [0], but didn't add the review tags from Guchun / Luben, since the patch changed. Cheers, Guilherme [0] https://lore.kernel.org/dri-devel/20230202134856.1382169-1-gpicc...@igalia.com/
[PATCH v3] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully. Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops: amdgpu: probe of :04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...] To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part. Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0], and also removed the no_scheduler check [1]. [0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb...@amd.com/ [1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7...@amd.com/ Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)") Suggested-by: Christian König Cc: Guchun Chen Cc: Luben Tuikov Cc: Mario Limonciello Signed-off-by: Guilherme G. Piccoli --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 00444203220d..faff4a3f96e6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue; - if (!ring->no_scheduler) + /* +* Notice we check for sched.ops since there's some +* override on the meaning of sched.ready by amdgpu. +* The natural check would be sched.ready, which is +* set as drm_sched_init() finishes... +*/ + if (ring->sched.ops) drm_sched_fini(>sched); for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j) -- 2.39.0
Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
On 01/02/2023 13:21, Luben Tuikov wrote: > Hi Guilherme, > > Since setting sched->ready to false, seems to be taking place in, directly > amdgpu_ring_fini() > and in amdgpu_fence_driver_sw_fini() indirectly as that function calls > drm_sched_fini() > which sets it to false, we seem to have two competing policies of, > "set ready to false to show that _fini() was called, and set to false to > disable IB submissions". > > To that effect, your patch is generally correct, as it would be the case of > an early failure > and unroll from (indirectly) amdgpu_device_init_schedulers(). > > Please resubmit your patch but using .ops as Christian suggested, as .name is > sufficient, > but .ops is necessary. > > On a side-note: in the future we should probably discern between > "this ring has an initialized and working scheduler" (looking up at DRM), from > "this ring can take on IBs to send them down to the hardware" (looking down > at hardware). > Sched->ready seems to be overloaded with these disparate states, and this is > why you need > to use .ops to guard calling drm_sched_fini(). > > Regards, > Luben Thanks a lot Luben, makes perfect sense! Also, thanks for everyone that provided feedback here, very interesting discussion. Submitted V2: https://lore.kernel.org/dri-devel/20230201164814.1353383-1-gpicc...@igalia.com/ Cheers, Guilherme
[PATCH v2] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully. Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops: amdgpu: probe of :04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...] To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part. Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using sched.ops as per Christian's suggestion [0]. [0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb...@amd.com/ Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)") Suggested-by: Christian König Cc: Guchun Chen Cc: Luben Tuikov Cc: Mario Limonciello Signed-off-by: Guilherme G. Piccoli --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 00444203220d..3b962cb680a6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue; - if (!ring->no_scheduler) + /* +* Notice we check for sched.ops since there's some +* override on the meaning of sched.ready by amdgpu. +* The natural check would be sched.ready, which is +* set as drm_sched_init() finishes... +*/ + if (!ring->no_scheduler && ring->sched.ops) drm_sched_fini(>sched); for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j) -- 2.39.0
Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
On 31/01/2023 14:52, Alex Deucher wrote: > [...] >> (b) We can't use sched.ready, which would make sense...but amdgpu >> overrides its meaning, the driver manipulates this value for its own >> purposes of tracking ring init, or something like that. >> >> This is the tangential topic: what should we do here? My understanding >> of Alex's message is that we could have a "ready" field in the ring >> structure and stop messing with sched.ready - does it make sense Alex? > > Yes, I think so. The tricky part will be figuring out which current > sched.ready checks are checking for the scheduler being ready vs. > whether the ring itself is ready. > Thanks, makes sense! $ grep -nr "sched.ready" drivers/gpu/drm/amd/ | wc -l 83 Maybe not super tough, I hope heh >> >> Guchun / Christian, does it also make sense for you? >> >> >> Regarding (a), I could re-submit having s/sched.name/sched.ops, no >> biggies, I tested both to be fair, before sending...I just chose name >> but any field that is proper initialized on drm_sched_init() would work. > > Yeah, I think ops is fine. You could even use sched.ready after we > clean up the use of that in the driver. There are already a bunch of > places where we check sched.ready to check if the scheduler really is > ready. Hmm..unfortunately, doesn't work. This was a case in which sched.ready was set to true in the ring init routine, but scheduler wasn't properly initialized. So, a different key for comparison is required..I'll re-submit with sched.ops. After a potential rework of the driver to get rid of sched.ready manipulation, then it could be fixed to properly use this flag...makes sense to you? Tnx again for the prompt review! Cheers, Guilherme
Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
On 31/01/2023 10:58, Chen, Guchun wrote: > Hi Christian, > > Do you think if it makes sense that we can set 'ring->sched.ready' to be true > in each ring init, even if before executing/setting up drm_sched_init in > amdgpu_device_init_schedulers? As 'ready' is a member of gpu scheduler > structure. > > Regards, > Guchun > Hi folks, thanks a lot for the feedback so far, much appreciated! I'm feeling a bit confused specially since there seems to be 2 orthogonal (yet related) topics being discussed; let me try to summarize my understanding and we can then further discuss better: (a) The first problem is the one addressed in the patch - how to prevent drm_sched_fini() to get called if drm_sched_init() wasn't called? I've proposed sched.name, seems Christian prefers sched.ops, correct? (b) We can't use sched.ready, which would make sense...but amdgpu overrides its meaning, the driver manipulates this value for its own purposes of tracking ring init, or something like that. This is the tangential topic: what should we do here? My understanding of Alex's message is that we could have a "ready" field in the ring structure and stop messing with sched.ready - does it make sense Alex? Guchun / Christian, does it also make sense for you? Regarding (a), I could re-submit having s/sched.name/sched.ops, no biggies, I tested both to be fair, before sending...I just chose name but any field that is proper initialized on drm_sched_init() would work. Thanks, Guilherme
[PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini routine - such function is expected to be called only after the respective init function - drm_sched_init() - was executed successfully. Happens that we faced a driver probe failure in the Steam Deck recently, and the function drm_sched_fini() was called even without its counter-part had been previously called, causing the following oops: amdgpu: probe of :04:00.0 failed with error -110 BUG: kernel NULL pointer dereference, address: 0090 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace: amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] amdgpu_driver_release_kms+0x16/0x30 [amdgpu] devm_drm_dev_init_release+0x49/0x70 [...] To prevent that, check if the drm_sched was properly initialized for a given ring before calling its fini counter-part. Notice ideally we'd use sched.ready for that; such field is set as the latest thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such field - in the above oops for example, it was a GFX ring causing the crash, and the sched.ready field was set to true in the ring init routine, regardless of the state of the DRM scheduler. Hence, we ended-up using another sched field. Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)") Cc: Andrey Grodzovsky Cc: Guchun Chen Cc: Mario Limonciello Signed-off-by: Guilherme G. Piccoli --- Hi folks, first of all thanks in advance for reviews / comments! Notice that I've used the Fixes tag more in the sense to bring it to stable, I didn't find a good patch candidate that added the call to drm_sched_fini(), was reaching way too old commits...so 067f44c8b459 seems a good candidate - or maybe not? Now, with regards sched.ready, spent a bit of time to figure what was happening...would be feasible maybe to stop using that to mark some kind ring status? I think it should be possible to add a flag to the ring structure for that, and free sched.ready from being manipulate by the amdgpu driver, what's your thoughts on that? I could try myself, but first of course I'd like to raise the "temperature" on this topic and check if somebody is already working on that. Cheers, Guilherme drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 00444203220d..e154eb8241fb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev) if (!ring || !ring->fence_drv.initialized) continue; - if (!ring->no_scheduler) + /* +* Notice we check for sched.name since there's some +* override on the meaning of sched.ready by amdgpu. +* The natural check would be sched.ready, which is +* set as drm_sched_init() finishes... +*/ + if (!ring->no_scheduler && ring->sched.name) drm_sched_fini(>sched); for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j) -- 2.39.0
Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini
+ Luben (sorry, missed that in the first submission). On 30/01/2023 18:45, Guilherme G. Piccoli wrote: > Currently amdgpu calls drm_sched_fini() from the fence driver sw fini > routine - such function is expected to be called only after the > respective init function - drm_sched_init() - was executed successfully. > > Happens that we faced a driver probe failure in the Steam Deck > recently, and the function drm_sched_fini() was called even without > its counter-part had been previously called, causing the following oops: > > amdgpu: probe of :04:00.0 failed with error -110 > BUG: kernel NULL pointer dereference, address: 0090 > PGD 0 P4D 0 > Oops: 0002 [#1] PREEMPT SMP NOPTI > CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338 > Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022 > RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] > [...] > Call Trace: > > amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu] > amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu] > amdgpu_driver_release_kms+0x16/0x30 [amdgpu] > devm_drm_dev_init_release+0x49/0x70 > [...] > > To prevent that, check if the drm_sched was properly initialized for a > given ring before calling its fini counter-part. > > Notice ideally we'd use sched.ready for that; such field is set as the latest > thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such > field - in the above oops for example, it was a GFX ring causing the crash, > and > the sched.ready field was set to true in the ring init routine, regardless of > the state of the DRM scheduler. Hence, we ended-up using another sched field. > > Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in > s3 test (v2)") > Cc: Andrey Grodzovsky > Cc: Guchun Chen > Cc: Mario Limonciello > Signed-off-by: Guilherme G. Piccoli > --- > > > Hi folks, first of all thanks in advance for reviews / comments! > Notice that I've used the Fixes tag more in the sense to bring it > to stable, I didn't find a good patch candidate that added the > call to drm_sched_fini(), was reaching way too old commits...so > 067f44c8b459 seems a good candidate - or maybe not? > > Now, with regards sched.ready, spent a bit of time to figure what > was happening...would be feasible maybe to stop using that to > mark some kind ring status? I think it should be possible to add a > flag to the ring structure for that, and free sched.ready from > being manipulate by the amdgpu driver, what's your thoughts on that? > > I could try myself, but first of course I'd like to raise the > "temperature" on this topic and check if somebody is already working > on that. > > Cheers, > > Guilherme > > > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > index 00444203220d..e154eb8241fb 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > @@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device > *adev) > if (!ring || !ring->fence_drv.initialized) > continue; > > - if (!ring->no_scheduler) > + /* > + * Notice we check for sched.name since there's some > + * override on the meaning of sched.ready by amdgpu. > + * The natural check would be sched.ready, which is > + * set as drm_sched_init() finishes... > + */ > + if (!ring->no_scheduler && ring->sched.name) > drm_sched_fini(>sched); > > for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
[PATCH] drm/amd/display: Trivial swizzle-related code clean-ups
This is a very trivial code clean-up related to commit 5468c36d6285 ("drm/amd/display: Filter Invalid 420 Modes for HDMI TMDS"). This commit added a validation on driver probe to prevent invalid TMDS modes, but one of the fake properties (swizzle) ended-up causing a warning on driver probe; was reported here: https://gitlab.freedesktop.org/drm/amd/-/issues/2264. It was fixed by commit 105a8b8698e2 ("drm/amd/display: patch cases with unknown plane state to prevent warning"), but the validation code had a double variable assignment, which we hereby remove. Also, the fix relies in the dcn2{0,1}patch_unknown_plane_state() callbacks, so while at it we took the opportunity to perform a small code clean-up in such routines. Cc: Aurabindo Pillai Cc: Daniel Wheeler Cc: Fangzhi Zuo Cc: Harry Wentland Cc: Leo Li Cc: Mark Broadworth Cc: Melissa Wen Cc: Rodrigo Siqueira Cc: Sung Joon Kim Cc: Swapnil Patel Signed-off-by: Guilherme G. Piccoli --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 - drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 8 ++-- drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 6 ++ 3 files changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 86a2f7f58550..e71e94663d14 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6336,7 +6336,6 @@ static enum dc_status dm_validate_stream_and_context(struct dc *dc, dc_plane_state->plane_size.surface_size.width = stream->src.width; dc_plane_state->plane_size.chroma_size.height = stream->src.height; dc_plane_state->plane_size.chroma_size.width = stream->src.width; - dc_plane_state->tiling_info.gfx9.swizzle = DC_SW_UNKNOWN; dc_plane_state->format = SURFACE_PIXEL_FORMAT_GRPH_ARGB; dc_plane_state->tiling_info.gfx9.swizzle = DC_SW_UNKNOWN; dc_plane_state->rotation = ROTATION_ANGLE_0; diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c index 531f405d2554..3af24ef9cb2d 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c @@ -2225,14 +2225,10 @@ enum dc_status dcn20_patch_unknown_plane_state(struct dc_plane_state *plane_stat enum surface_pixel_format surf_pix_format = plane_state->format; unsigned int bpp = resource_pixel_format_to_bpp(surf_pix_format); - enum swizzle_mode_values swizzle = DC_SW_LINEAR; - + plane_state->tiling_info.gfx9.swizzle = DC_SW_64KB_S; if (bpp == 64) - swizzle = DC_SW_64KB_D; - else - swizzle = DC_SW_64KB_S; + plane_state->tiling_info.gfx9.swizzle = DC_SW_64KB_D; - plane_state->tiling_info.gfx9.swizzle = swizzle; return DC_OK; } diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c index fbcf0afeae0d..8f9244fe5c86 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c @@ -1393,15 +1393,13 @@ static uint32_t read_pipe_fuses(struct dc_context *ctx) static enum dc_status dcn21_patch_unknown_plane_state(struct dc_plane_state *plane_state) { - enum dc_status result = DC_OK; - if (plane_state->ctx->dc->debug.disable_dcc == DCC_ENABLE) { plane_state->dcc.enable = 1; /* align to our worst case block width */ plane_state->dcc.meta_pitch = ((plane_state->src_rect.width + 1023) / 1024) * 1024; } - result = dcn20_patch_unknown_plane_state(plane_state); - return result; + + return dcn20_patch_unknown_plane_state(plane_state); } static const struct resource_funcs dcn21_res_pool_funcs = { -- 2.39.0
Re: [PATCH 0/4] Update AMD APU IP version docs
On 18/01/2023 14:28, Mario Limonciello wrote: > Guilherme G. Piccoli submitted some patches recently to the mailing list > related to removing IP version detection code. As part of the discussion > it was mentioned that non-obviously some IP blocks are shared between > multiple APU parts. > > The example given was for launched products Rembrandt and Mendocino VCN, > but in looking at the existing table I found that we don't actually > indicate the VCN version for Mendocino. > > So I added this to the table, and then noticed we're missing references > to some other recently launched products. > > So update the table with all missing launched products as well as products > launching this year that don't have IP version changes. > > Mario Limonciello (4): > Documentation/gpu: Add MP0 version to apu-asic-info-table > Documentation/gpu: Update lines for GREEN_SARDINE and YELLOW_CARP > Documentation/gpu: Add Mendocino to apu-asic-info-table > Documentation/gpu: Add Raphael to apu-asic-info-table > > .../gpu/amdgpu/apu-asic-info-table.csv | 18 ++ > 1 file changed, 10 insertions(+), 8 deletions(-) > Thanks a lot for the improvement Mario, much appreciated! Cheers, Guilherme
Re: [PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect SRAM HW model check
On 17/01/2023 15:14, Limonciello, Mario wrote: > [Public] > > > >> -Original Message- >> From: Guilherme G. Piccoli >> Sent: Tuesday, January 17, 2023 12:14 >> To: Limonciello, Mario ; amd- >> g...@lists.freedesktop.org; Deucher, Alexander >> >> Cc: dri-de...@lists.freedesktop.org; Koenig, Christian >> ; Pan, Xinhui ; >> ker...@gpiccoli.net; kernel-...@igalia.com; Zhu, James >> ; Lazar, Lijo ; Liu, Leo >> ; Jiang, Sonny >> Subject: Re: [PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect >> SRAM HW model check >> >> On 17/01/2023 15:08, Limonciello, Mario wrote: >>> [...] >>> >>> Should have added this tag too: >>> Suggested-by: Alexander Deucher >>> >>> Looks good to me, thanks! >>> Reviewed-by: Mario Limonciello >>> >> >> You're totally right, thanks for the reminder and apologies for missing >> that! Just sending V3 heheh >> >> Ah, thanks for the reviews and prompt responses. >> Cheers, >> >> >> Guilherme > > No need to resend. Patchwork will embed the tags when we pick this up. Already did, but thanks again for the info - learning a lot in this thread =)
[PATCH v3 2/2] drm/amdgpu/vcn: Remove redundant indirect SRAM HW model check
The HW model validation that guards the indirect SRAM checking in the VCN code path is redundant - there's no model that's not included in the switch, making it useless in practice [0]. So, let's remove this switch statement for good. [0] lore.kernel.org/amd-gfx/mn0pr12mb61013d20b8a2263b22ae1bcfe2...@mn0pr12mb6101.namprd12.prod.outlook.com Suggested-by: Alex Deucher Reviewed-by: Mario Limonciello Cc: James Zhu Cc: Lazar Lijo Cc: Leo Liu Cc: Sonny Jiang Signed-off-by: Guilherme G. Piccoli --- V3: * Added Mario's review tag and Alex's suggested tag - thanks for the reminder Mario! V2: * Changed the approach after ML discussion- instead of cleaning up the switch statement, removed it entirely - special thanks to Alex and Mario for the feedback! drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 81 + 1 file changed, 3 insertions(+), 78 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index 1b1a3c9e1863..02d428ddf2f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -110,84 +110,9 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) for (i = 0; i < adev->vcn.num_vcn_inst; i++) atomic_set(>vcn.inst[i].dpg_enc_submission_cnt, 0); - switch (adev->ip_versions[UVD_HWIP][0]) { - case IP_VERSION(1, 0, 0): - case IP_VERSION(1, 0, 1): - case IP_VERSION(2, 5, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 2, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 6, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 0, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 0, 2): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 0): - case IP_VERSION(3, 0, 64): - case IP_VERSION(3, 0, 192): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 2): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 16): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 33): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 1, 1): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 1, 2): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(4, 0, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(4, 0, 2): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_s
[PATCH v3 1/2] drm/amdgpu/vcn: Adjust firmware names indentation
This is an incredibly trivial fix, just for the sake of "aesthetical" organization of the defines. Some were space based, most were tab based and there was a lack of "alignment", now it's all the same and aligned. Cc: James Zhu Cc: Lazar Lijo Cc: Leo Liu Cc: Mario Limonciello Cc: Sonny Jiang Reviewed-by: Alex Deucher Signed-off-by: Guilherme G. Piccoli --- V2/V3: * Added Alex's review tag - thanks! drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 38 - 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index f8397d993f23..1b1a3c9e1863 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -36,26 +36,26 @@ #include "soc15d.h" /* Firmware Names */ -#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin" -#define FIRMWARE_PICASSO "amdgpu/picasso_vcn.bin" -#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin" -#define FIRMWARE_ARCTURUS "amdgpu/arcturus_vcn.bin" -#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin" -#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin" -#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin" -#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin" -#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin" -#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin" -#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin" -#define FIRMWARE_VANGOGH "amdgpu/vangogh_vcn.bin" +#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin" +#define FIRMWARE_PICASSO "amdgpu/picasso_vcn.bin" +#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin" +#define FIRMWARE_ARCTURUS "amdgpu/arcturus_vcn.bin" +#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin" +#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin" +#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin" +#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin" +#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin" +#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin" +#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin" +#define FIRMWARE_VANGOGH "amdgpu/vangogh_vcn.bin" #define FIRMWARE_DIMGREY_CAVEFISH "amdgpu/dimgrey_cavefish_vcn.bin" -#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin" -#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin" -#define FIRMWARE_YELLOW_CARP "amdgpu/yellow_carp_vcn.bin" -#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin" -#define FIRMWARE_VCN4_0_0 "amdgpu/vcn_4_0_0.bin" -#define FIRMWARE_VCN4_0_2 "amdgpu/vcn_4_0_2.bin" -#define FIRMWARE_VCN4_0_4 "amdgpu/vcn_4_0_4.bin" +#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin" +#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin" +#define FIRMWARE_YELLOW_CARP "amdgpu/yellow_carp_vcn.bin" +#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin" +#define FIRMWARE_VCN4_0_0 "amdgpu/vcn_4_0_0.bin" +#define FIRMWARE_VCN4_0_2 "amdgpu/vcn_4_0_2.bin" +#define FIRMWARE_VCN4_0_4 "amdgpu/vcn_4_0_4.bin" MODULE_FIRMWARE(FIRMWARE_RAVEN); MODULE_FIRMWARE(FIRMWARE_PICASSO); -- 2.39.0
Re: [PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect SRAM HW model check
On 17/01/2023 15:08, Limonciello, Mario wrote: > [...] > > Should have added this tag too: > Suggested-by: Alexander Deucher > > Looks good to me, thanks! > Reviewed-by: Mario Limonciello > You're totally right, thanks for the reminder and apologies for missing that! Just sending V3 heheh Ah, thanks for the reviews and prompt responses. Cheers, Guilherme
[PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect SRAM HW model check
The HW model validation that guards the indirect SRAM checking in the VCN code path is redundant - there's no model that's not included in the switch, making it useless in practice [0]. So, let's remove this switch statement for good. [0] lore.kernel.org/amd-gfx/mn0pr12mb61013d20b8a2263b22ae1bcfe2...@mn0pr12mb6101.namprd12.prod.outlook.com Cc: James Zhu Cc: Lazar Lijo Cc: Leo Liu Cc: Mario Limonciello Cc: Sonny Jiang Signed-off-by: Guilherme G. Piccoli --- V2: * Changed the approach after ML discussion- instead of cleaning up the switch statement, removed it entirely - special thanks to Alex and Mario for the feedback! Notice that patch 3 was dropped from this series after reviews. drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 81 + 1 file changed, 3 insertions(+), 78 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index 1b1a3c9e1863..02d428ddf2f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -110,84 +110,9 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) for (i = 0; i < adev->vcn.num_vcn_inst; i++) atomic_set(>vcn.inst[i].dpg_enc_submission_cnt, 0); - switch (adev->ip_versions[UVD_HWIP][0]) { - case IP_VERSION(1, 0, 0): - case IP_VERSION(1, 0, 1): - case IP_VERSION(2, 5, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 2, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 6, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 0, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 0, 2): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 0): - case IP_VERSION(3, 0, 64): - case IP_VERSION(3, 0, 192): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 2): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 16): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 33): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 1, 1): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 1, 2): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(4, 0, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(4, 0, 2): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case I
[PATCH v2 1/2] drm/amdgpu/vcn: Adjust firmware names indentation
This is an incredibly trivial fix, just for the sake of "aesthetical" organization of the defines. Some were space based, most were tab based and there was a lack of "alignment", now it's all the same and aligned. Cc: James Zhu Cc: Lazar Lijo Cc: Leo Liu Cc: Mario Limonciello Cc: Sonny Jiang Reviewed-by: Alex Deucher Signed-off-by: Guilherme G. Piccoli --- V2: * Added Alex's review tag - thanks! drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 38 - 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index f8397d993f23..1b1a3c9e1863 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -36,26 +36,26 @@ #include "soc15d.h" /* Firmware Names */ -#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin" -#define FIRMWARE_PICASSO "amdgpu/picasso_vcn.bin" -#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin" -#define FIRMWARE_ARCTURUS "amdgpu/arcturus_vcn.bin" -#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin" -#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin" -#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin" -#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin" -#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin" -#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin" -#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin" -#define FIRMWARE_VANGOGH "amdgpu/vangogh_vcn.bin" +#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin" +#define FIRMWARE_PICASSO "amdgpu/picasso_vcn.bin" +#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin" +#define FIRMWARE_ARCTURUS "amdgpu/arcturus_vcn.bin" +#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin" +#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin" +#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin" +#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin" +#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin" +#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin" +#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin" +#define FIRMWARE_VANGOGH "amdgpu/vangogh_vcn.bin" #define FIRMWARE_DIMGREY_CAVEFISH "amdgpu/dimgrey_cavefish_vcn.bin" -#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin" -#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin" -#define FIRMWARE_YELLOW_CARP "amdgpu/yellow_carp_vcn.bin" -#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin" -#define FIRMWARE_VCN4_0_0 "amdgpu/vcn_4_0_0.bin" -#define FIRMWARE_VCN4_0_2 "amdgpu/vcn_4_0_2.bin" -#define FIRMWARE_VCN4_0_4 "amdgpu/vcn_4_0_4.bin" +#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin" +#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin" +#define FIRMWARE_YELLOW_CARP "amdgpu/yellow_carp_vcn.bin" +#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin" +#define FIRMWARE_VCN4_0_0 "amdgpu/vcn_4_0_0.bin" +#define FIRMWARE_VCN4_0_2 "amdgpu/vcn_4_0_2.bin" +#define FIRMWARE_VCN4_0_4 "amdgpu/vcn_4_0_4.bin" MODULE_FIRMWARE(FIRMWARE_RAVEN); MODULE_FIRMWARE(FIRMWARE_PICASSO); -- 2.39.0
Re: [PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode
On 17/01/2023 13:24, Limonciello, Mario wrote: > [...] >>> Though I see two problems with that: first, I'm not sure what's the >>> impact in the GPU functioning when I disable some IP block. >>> > > It depends on the individual block what the impact is. For example > if you don't have VCN, then you can't do any accelerated video playback. > >>> Second, the parameter is a bit hard to figure - we need to clear a bit >>> for the IP block we want to disable, and the doc suggest to read on >>> dmesg to get this information (it seems it changes depending on the HW >>> model), but I couldn't parse the proper bit from dmesg. Needed to >>> instrument the kernel to find the proper bit heh >>> > > Isn't it this stuff (taken from a CZN system): > > [7.797779] [drm] add ip block number 0 > [7.797781] [drm] add ip block number 1 > [7.797782] [drm] add ip block number 2 > [7.797783] [drm] add ip block number 3 > [7.797783] [drm] add ip block number 4 > [7.797784] [drm] add ip block number 5 > [7.797785] [drm] add ip block number 6 > [7.797786] [drm] add ip block number 7 > [7.797787] [drm] add ip block number 8 > [7.797788] [drm] add ip block number 9 > > So for that system it would be bit 8 to disable vcn. > > In terms of how debugging would work: > I would expect when you get your failure it will have been the previous > block # that failed, and so you can reboot with that block masked and > see if you get further. > Thanks Mario, much appreciated! You're totally right and I messed up not seeing these obvious messages... So, I'll just drop the parameter on V2. Cheers, Guilherme
Re: [PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode
On 16/01/2023 23:33, Limonciello, Mario wrote: > [...] > > For debugging these type of problems, I think an effective debugging > tactic would have been to mask the IP block (amdgpu.ip_block_mask). Thank you, it worked indeed - nice suggestion! Though I see two problems with that: first, I'm not sure what's the impact in the GPU functioning when I disable some IP block. Second, the parameter is a bit hard to figure - we need to clear a bit for the IP block we want to disable, and the doc suggest to read on dmesg to get this information (it seems it changes depending on the HW model), but I couldn't parse the proper bit from dmesg. Needed to instrument the kernel to find the proper bit heh The second part is easy to improve (we can just show this bit in dmesg!), I might do that instead of proposing this parameter, which seems didn't raise much excitement after all heheh Finally, I'm still curious on why Deck was working fine with the indirect SRAM mode disabled (by mistake) in many kernels - was it in practice the same as disabling the VCN IP block? Thanks, Guilherme
Re: [PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode
On 16/01/2023 20:00, Alex Deucher wrote: > [...] > > It's not clear to me when this would be used. We only disable it > briefly during new asic bring up, after that we never touch it again. > No end user on production hardware should ever have to change it and > doing so would break VCN on their system. > > Alex [+ Pierre-Loup] Steam Deck is facing a pretty weird scenario then heheh Commit 82132ecc543 ("drm/amdgpu: enable Vangogh VCN indirect sram mode") corrected a long-term issue in which the indirect SRAM mode wasn't enabled for Vangogh - and Deck GPU architecture is Vangogh, so it was working perfectly with that disabled. Happens that a bad FW candidate seems to have broken something - it was a bit convoluted to debug, but we proved that disabling indirect SRAM is a good workaround so far, it "restored the behavior" pre-82132ecc543. Hence my proposal - this parameter would've made life so much easier, and we're start using it "downstream" now. While I understand that of course the FW should be fixed, meanwhile this is a cheap solution to allow further debug and real use of the system. Thanks, Guilherme
Re: [PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode
On 16/01/2023 19:27, Liu, Leo wrote: > Secure part requires PSP load VCN boot sequence which is with indirect > sram mode. > > Regards, > Leo With that said, and with the plans of just setting the indirect sram mode nevertheless (with no switch/cases anymore - I'll submit the V2 tomorrow if everybody agrees), does anybody oppose to have this parameter for debug reasons? I first considered doing through debugfs, but obviously was a (very) bad idea, since this should be an early boot tuning heheh Cheers, Guilherme
Re: [PATCH 2/3] drm/amdgpu/vcn: Deduplicate indirect SRAM checking code
On 16/01/2023 18:41, Alex Deucher wrote: > [...] >> + case IP_VERSION(4, 0, 0): /* Vega10 */ > > This comment is incorrect. Vega10 does not have VCN (it has UVD and VCE). > > Alex Thanks Alex! I'll resubmit V2 with this comment removed. I've stolen it from https://gitlab.freedesktop.org/agd5f/linux/-/blob/amd-staging-drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c#L1147 , but obviously I misunderstood the mapping here heheh Cheers, Guilherme
Re: [PATCH 2/3] drm/amdgpu/vcn: Deduplicate indirect SRAM checking code
On 16/01/2023 18:49, Limonciello, Mario wrote: > [...] > > The problem with adding a comment about which platform it's in is that > this will probably go stale. Some IP blocks are re-used in multiple ASICs. > > VCN 3, 1, 1 is a great example. This is used in both Yellow Carp (Rembrandt) > as well > as Mendocino. I would think a better approach would be to have a single > point > of documentation somewhere in the source tree that we can update after new > ASICs launch that could act as a decoder ring. > > It's effort to remember to update it, but it's at least a single point of > failure. > Hi Mario, thanks - it makes a lot of sense! I can remove the comments in the V2, I'll wait a bit more for (potential) extra feedback but I agree with you, so I'm inclined to remove them. Cheers, Guilherme
[PATCH 2/3] drm/amdgpu/vcn: Deduplicate indirect SRAM checking code
Currently both conditionals checked to enable indirect SRAM are repeated for all HW/IP models. Let's just simplify it a bit so code is more compact and readable. While at it, add the legacy names as a comment per case block, to help whoever is reading the code and doesn't have much experience with the name/number mapping. Cc: James Zhu Cc: Lazar Lijo Cc: Leo Liu Cc: Mario Limonciello Cc: Sonny Jiang Signed-off-by: Guilherme G. Piccoli --- Hi folks, first of all thanks in advance for reviews/comments! This work is based on agd5f/amd-staging-drm-next branch - there is this patch from Mario that refactored the amdgpu_vcn.c, and since it dropped the legacy names from the switch cases, I've decided to also include them here as comments. I'm not sure if that's a good idea, feels good for somebody not so used to the code read the codenames instead of purely numbers, but if you wanna move away from the legacy names for good, lemme know and I can rework without these comments. Cheers, Guilherme drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 84 + 1 file changed, 16 insertions(+), 68 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index 1b1a3c9e1863..1f880e162d9d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -111,78 +111,26 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) atomic_set(>vcn.inst[i].dpg_enc_submission_cnt, 0); switch (adev->ip_versions[UVD_HWIP][0]) { - case IP_VERSION(1, 0, 0): - case IP_VERSION(1, 0, 1): - case IP_VERSION(2, 5, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 2, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 6, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 0, 0): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(2, 0, 2): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 0): - case IP_VERSION(3, 0, 64): - case IP_VERSION(3, 0, 192): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 2): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 16): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 0, 33): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; - case IP_VERSION(3, 1, 1): - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) && - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG)) - adev->vcn.indirect_sram = true; - break; + case IP_VERSION(1, 0, 0): /* Raven (1/2) / Picasso */ + case IP_VERSION(1, 0, 1): /* Raven (1/2) / Picasso */ + case IP_VERSION(2, 0, 0): /* Navi10 */ + case IP_VERSION(2, 0, 2): /* Navi12 / Navi14 */ + case IP_VERSION(2, 2, 0): /* Renoir / Green Sardine */ + case IP_VERSION(2, 5, 0): /* Arcturus */ + case IP_VERSION(2, 6, 0): /* Aldebaran */ + case IP_VERSION(3, 0, 0): /* Sienna Cichlid / Navy Flounder */ + case IP_VERSION(3, 0, 2): /* Vangogh */ + case IP_VERSION(3, 0,
[PATCH 1/3] drm/amdgpu/vcn: Adjust firmware names indentation
This is an incredibly trivial fix, just for the sake of "aesthetical" organization of the defines. Some were space based, most were tab based and there was a lack of "alignment", now it's all the same and aligned. Cc: James Zhu Cc: Lazar Lijo Cc: Leo Liu Cc: Mario Limonciello Cc: Sonny Jiang Signed-off-by: Guilherme G. Piccoli --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 38 - 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index f8397d993f23..1b1a3c9e1863 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -36,26 +36,26 @@ #include "soc15d.h" /* Firmware Names */ -#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin" -#define FIRMWARE_PICASSO "amdgpu/picasso_vcn.bin" -#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin" -#define FIRMWARE_ARCTURUS "amdgpu/arcturus_vcn.bin" -#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin" -#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin" -#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin" -#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin" -#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin" -#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin" -#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin" -#define FIRMWARE_VANGOGH "amdgpu/vangogh_vcn.bin" +#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin" +#define FIRMWARE_PICASSO "amdgpu/picasso_vcn.bin" +#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin" +#define FIRMWARE_ARCTURUS "amdgpu/arcturus_vcn.bin" +#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin" +#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin" +#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin" +#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin" +#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin" +#define FIRMWARE_SIENNA_CICHLID"amdgpu/sienna_cichlid_vcn.bin" +#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin" +#define FIRMWARE_VANGOGH "amdgpu/vangogh_vcn.bin" #define FIRMWARE_DIMGREY_CAVEFISH "amdgpu/dimgrey_cavefish_vcn.bin" -#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin" -#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin" -#define FIRMWARE_YELLOW_CARP "amdgpu/yellow_carp_vcn.bin" -#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin" -#define FIRMWARE_VCN4_0_0 "amdgpu/vcn_4_0_0.bin" -#define FIRMWARE_VCN4_0_2 "amdgpu/vcn_4_0_2.bin" -#define FIRMWARE_VCN4_0_4 "amdgpu/vcn_4_0_4.bin" +#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin" +#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin" +#define FIRMWARE_YELLOW_CARP "amdgpu/yellow_carp_vcn.bin" +#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin" +#define FIRMWARE_VCN4_0_0 "amdgpu/vcn_4_0_0.bin" +#define FIRMWARE_VCN4_0_2 "amdgpu/vcn_4_0_2.bin" +#define FIRMWARE_VCN4_0_4 "amdgpu/vcn_4_0_4.bin" MODULE_FIRMWARE(FIRMWARE_RAVEN); MODULE_FIRMWARE(FIRMWARE_PICASSO); -- 2.39.0
[PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode
Currently the FW loading path perform some checks based on IP model and in case it is advertised as supported, the VCN indirect SRAM mode is used. Happens that in case there's any issue on FW and this mode ends-up not being properly supported, the driver probe fails [0]. Debugging this requires driver rebuilding, so to allow fast debug and experiments, add a parameter to force setting indirect SRAM mode to true/false from the kernel command-line; parameter default is -1, which doesn't change the current driver's behavior. [0] Example of this issue, observed on Steam Deck: [drm] kiq ring mec 2 pipe 1 q 0 [drm] failed to load ucode VCN0_RAM(0x3A) [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x) amdgpu :04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110) [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block failed -110 amdgpu :04:00.0: amdgpu: amdgpu_device_ip_init failed amdgpu :04:00.0: amdgpu: Fatal error during GPU init Cc: James Zhu Cc: Lazar Lijo Cc: Leo Liu Cc: Mario Limonciello Cc: Sonny Jiang Signed-off-by: Guilherme G. Piccoli --- This work is based on agd5f/amd-staging-drm-next branch. Thanks in advance for reviews/comments! Cheers, Guilherme drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 + drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 3 +++ 3 files changed, 13 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 872450a3a164..5d3c92c94f18 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -215,6 +215,7 @@ extern int amdgpu_noretry; extern int amdgpu_force_asic_type; extern int amdgpu_smartshift_bias; extern int amdgpu_use_xgmi_p2p; +extern int amdgpu_indirect_sram; #ifdef CONFIG_HSA_AMD extern int sched_policy; extern bool debug_evictions; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 06aba201d4db..c7182c0bc841 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -187,6 +187,7 @@ int amdgpu_num_kcq = -1; int amdgpu_smartshift_bias; int amdgpu_use_xgmi_p2p = 1; int amdgpu_vcnfw_log; +int amdgpu_indirect_sram = -1; static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work); @@ -941,6 +942,14 @@ MODULE_PARM_DESC(smu_pptable_id, "specify pptable id to be used (-1 = auto(default) value, 0 = use pptable from vbios, > 0 = soft pptable id)"); module_param_named(smu_pptable_id, amdgpu_smu_pptable_id, int, 0444); +/** + * DOC: indirect_sram (int) + * Allow users to force using (or not) the VCN indirect SRAM mode in the fw load + * code. Default is -1, meaning auto (aka, don't mess with driver's behavior). + */ +MODULE_PARM_DESC(indirect_sram, "Force VCN indirect SRAM (-1 = auto (default), 0 = disabled, 1 = enabled)"); +module_param_named(indirect_sram, amdgpu_indirect_sram, int, 0444); + /* These devices are not supported by amdgpu. * They are supported by the mach64, r128, radeon drivers */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c index 1f880e162d9d..a2290087e01c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -137,6 +137,9 @@ int amdgpu_vcn_sw_init(struct amdgpu_device *adev) return -EINVAL; } + if (amdgpu_indirect_sram >= 0) + adev->vcn.indirect_sram = (bool)amdgpu_indirect_sram; + hdr = (const struct common_firmware_header *)adev->vcn.fw->data; adev->vcn.fw_version = le32_to_cpu(hdr->ucode_version); -- 2.39.0
[PATCH] drm/amd/display: Fix warning caused by a bad fake property
Commit 5468c36d6285 ("drm/amd/display: Filter Invalid 420 Modes for HDMI TMDS") added a validation on driver probe to prevent invalid TMDS modes. For that, it requires to fake some plane properties before submitting to the validation routines. Happens that one of such fake properties (swizzle) causes a boot-time warning due to an assert on swizzle routines. Since it's a fake property, change it hereby to a valid value to prevent such warning. Also, while at it, fix the unnecessary double assignment of this same property in code. Fixes: 5468c36d6285 ("drm/amd/display: Filter Invalid 420 Modes for HDMI TMDS") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2264 Reported-by: Melissa Wen Cc: Fangzhi Zuo Cc: Harry Wentland Cc: Leo Li Cc: Mark Broadworth Cc: Rodrigo Siqueira Cc: Sung Joon Kim Signed-off-by: Guilherme G. Piccoli --- Hi folks, notice my choice here was to set swizzle to DC_SW_LINEAR; I've considered creating a new enum member (like "DC_SW_FAKE", or something like that) but seems it's easier to just use the LINEAR one. In my (very cheap!) understanding, this shouldn't affect the purpose of the TMDS validation thing. Lemme know if there's something to improve here, and thanks in advance for reviews/comments. Cheers, Guilherme drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index 1b7f20a9d4ae..35ab39fd9345 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6261,9 +6261,8 @@ static enum dc_status dm_validate_stream_and_context(struct dc *dc, dc_plane_state->plane_size.surface_size.width = stream->src.width; dc_plane_state->plane_size.chroma_size.height = stream->src.height; dc_plane_state->plane_size.chroma_size.width = stream->src.width; - dc_plane_state->tiling_info.gfx9.swizzle = DC_SW_UNKNOWN; dc_plane_state->format = SURFACE_PIXEL_FORMAT_GRPH_ARGB; - dc_plane_state->tiling_info.gfx9.swizzle = DC_SW_UNKNOWN; + dc_plane_state->tiling_info.gfx9.swizzle = DC_SW_LINEAR; dc_plane_state->rotation = ROTATION_ANGLE_0; dc_plane_state->is_tiling_rotated = false; dc_plane_state->tiling_info.gfx8.array_mode = DC_ARRAY_LINEAR_GENERAL; -- 2.39.0
Re: [PATCH 2/3] drm/amdgpu: stop freeing PSP buffers during suspend
Hi Christian, thanks for the fix! It worked fine on Steam Deck running the game Cuphead, feel free to add my: Tested-by: Guilherme G. Piccoli
Re: [PATCH 3/3] drm/amdgpu: WARN when freeing kernel memory during suspend
Hey Christian, very interesting idea! I've tested it alone (without patch 2) and was able to see the PSP code freeing buffers during suspend, leading to a resume failure later. Feel free to add my: Tested-by: Guilherme G. Piccoli Cheers, Guilherme
Re: [PATCH] drm/amdgpu/psp: don't free PSP buffers on suspend
Thanks for the fix, feel free to add my: Tested-by: Guilherme G. Piccoli
Re: Reuse framebuffer after a kexec (amdgpu / efifb)
On 10/12/2021 11:16, Alex Deucher wrote:> [...] > Why not just reload the driver after kexec? > > Alex Because the original issue is the kdump case, and we want a very very tiny kernel - also, the crash originally could have been caused by amdgpu itself, so if it's a GPU issue, we don't want to mess with that in kdump. And I confess I tried modprobe amdgpu after a kdump, no success - kdump won't call shutdown handlers, so GPU will be in a "rogue" state... My question was about regular kexec because it's much simpler usually, we can do whatever we want there. My line of thought was: if I make it work in regular kexec with a simple framebuffer, I might be able to get it working on kdump heheh
Re: Reuse framebuffer after a kexec (amdgpu / efifb)
On 10/12/2021 12:13, Christian König wrote: > [...] > How about issuing a PCIe reset and re-initializing the ASIC with just > the VBIOS? > > That should be pretty straightforward I think. > > Christian. Thanks Christian, that'd be perfect! Is it feasible? Per Alex comment, we'd need to run atombios commands to reprogram the timings, display info, etc...like a small driver would do, a full init. Also, what kind of PCIe reset is recommended for this adapter? Like a hot reset, powering-off/re-power, FLR or that MODE2 reset present in amdgpu code? Remembering this is an APU device. Thanks a lot!
Re: Reuse framebuffer after a kexec (amdgpu / efifb)
Thanks a lot Alex / Gerd and Thomas, very informative stuff! I'm glad there are projects to collect/save the data and reuse after a kdump, this is very useful. I'll continue my study on the atombios thing of AMD and QXL, maybe at least we can make it work in qemu, that'd be great (like a small initdriver to reprogram de paravirtual device on kexec boot). Cheers, Guilherme
Re: Reuse framebuffer after a kexec (amdgpu / efifb)
Thanks again Alex! Some comments inlined below: On 09/12/2021 15:06, Alex Deucher wrote: > Not really in a generic way. It's asic and platform specific. In > addition most modern displays require link training to bring up the > display, so you can't just save and restore registers. Oh sure, I understand that. My question is more like: is there a way, inside amdgpu driver, to save this state before taking over/overwriting/reprogramming the device? So we could (again, from inside the amdgpu driver) dump this pre-saved state in the shutdown handler, for example, having the device in a "pre-OS" state when the new kexec'ed kernel starts. > > The drivers are asic and platform specific. E.g., the driver for > vangogh is different from renoir is different from skylake, etc. The > display programming interfaces are asic specific. Cool, that makes sense! But if you (or anybody here) know some of these GOP drivers, e.g. for the qemu/qxl device, I'm just curious to see/understand how complex is the FW driver to just put the device/screen in a usable state. Cheers, Guilherme
Re: Reuse framebuffer after a kexec (amdgpu / efifb)
On 09/12/2021 14:31, Alex Deucher wrote: > [...] > Once the driver takes over, none of the pre-driver state is retained. > You'll need to load the driver in the new kernel to initialize the > displays. Note the efifb doesn't actually have the ability to program > any hardware, it just takes over the memory region that was used for > the pre-OS framebuffer and whatever display timing was set up by the > GOP driver prior to the OS loading. Once that OS driver has loaded > the area is gone and the display configuration may have changed. > Hi Christian and Alex, thanks for the clarifications! Is there any way to save/retain this state before amdgpu takes over? Would simpledrm be able to program the device again, to a working state? Finally, do you have any example of such a GOP driver (open source) so I can take a look? I tried to find something like that in Tianocore project, but didn't find anything that seemed useful for my issue. Thanks again!
Reuse framebuffer after a kexec (amdgpu / efifb)
Hi all, I have a question about the possibility of reusing a framebuffer after a regular (or panic) kexec - my case is with amdgpu (APU, aka, not a separate GPU hardware), but I guess the question is kinda generic hence I've looped most of the lists / people I think does make sense (apologies for duplicates). The context is: we have a hardware that has an amdgpu-controlled device (Vangogh model) and as soon as the machine boots, efifb is providing graphics - I understand the UEFI/GRUB outputs rely in EFI framebuffer as well. As soon amdgpu module is available, kernel loads it and it takes over the GPU, providing graphics. The kexec_file_load syscall allows to pass a valid screen_info structure, so by kexec'ing a new kernel, we have again efifb taking over on boot time, but this time I see nothing in the screen. I've manually blacklisted amdgpu in this new kexec'ed kernel, I'd like to rely in the simple framebuffer - the goal is to have a tiny kernel kexec'ed. I'm using kernel version 5.16.0-rc4. I've done some other experiments, for exemple: I've forced screen_info model to match VLFB, so vesafb took over after the kexec, with the same result. Also noticed that BusMaster bit was off after kexec, in the AMD APU PCIe device, so I've set it on efifb before probe, and finally tested the same things in qemu, with qxl, all with the same result (blank screen). The most interesting result I got (both with amdgpu and qemu/qxl) is that if I blacklist these drivers and let the machine continue using efifb since the beginning, after kexec the efifb is still able to produce graphics. Which then led me to think that likely there's something fundamentally "blocking" the reuse of the simple framebuffer after kexec, like maybe DRM stack is destroying the old framebuffer somehow? What kind of preparation is required at firmware level to make the simple EFI VGA framebuffer work, and could we perform this in a kexec (or "save it" before the amdgpu/qxl drivers take over and reuse later)? Any advice is greatly appreciated! Thanks in advance, Guilherme