[PATCH] drm/amdgpu: add safeguards for accessing mmhub CG registers

2022-01-25 Thread Lang Yu
We observed a gpu hang when querying mmhub CG status(i.e., cat amdgpu_pm_info) on cyan skillfish. Acctually, cyan skillfish doesn't support any CG features. Only allow asics which support CG features accessing related registers. Will add similar safeguards for other IPs in the furture.

Re: [PATCH 1/1] drm/amdgpu: Wipe all VRAM on free when RAS is enabled

2022-01-25 Thread Christian König
Am 25.01.22 um 16:58 schrieb Felix Kuehling: On GPUs with RAS, poison can propagate between processes if VRAM is not cleared when it is freed or allocated. The reason is, that not all write accesses clear RAS poison. 32-byte writes by the SDMA engine do clear RAS poison. Clearing memory in the

Re: [PATCH] drm/amdgpu: Fix an error message in rmmod

2022-01-25 Thread Yin, Tianci (Rico)
[AMD Official Use Only] Thanks Kevin and Felix! In gfxoff state, the dequeue request(by cp register writing) can't make gfxoff exit, actually the cp is powered off and the cp register writing is invalid, doorbell registers writing(regluar way) or directly request smu to disable gfx

RE: [PATCH 1/1] drm/amdkfd: Don't take process mutex for svm ioctls

2022-01-25 Thread Chen, Guchun
[Public] To simply code lines, I guess we can drop variable 'r'. And use 'return svm_ioctl(p, args->op ' directly. Regards, Guchun -Original Message- From: amd-gfx On Behalf Of Philip Yang Sent: Wednesday, January 26, 2022 2:04 AM To: amd-gfx@lists.freedesktop.org Cc: Yang, Philip

Re: drm/amd/amdgpu: Add ip_discovery_text sysfs entry (v2)

2022-01-25 Thread Limonciello, Mario
A few suggestion ideas inline. On 1/25/2022 12:18, Tom St Denis wrote: Newer hardware has a discovery table in hardware that the kernel will rely on instead of header files for things like IP offsets. This sysfs entry adds a simple to parse table of IP instances and segment offsets. Produces

[PATCH v3 0/3] amdgpu/pm: Implement parallel sysfs_emit solution for navi10

2022-01-25 Thread Darren Powell
== Description == Scnprintf use within the kernel is not recommended, but simple sysfs_emit replacement has not been successful due to the page alignment requirement of the function. This patch set implements a new api "emit_clock_levels" to facilitate passing both the base and offset to the

[PATCH v3 1/3] amdgpu/pm: Implement new API function "emit" that accepts buffer base and write offset

2022-01-25 Thread Darren Powell
(v3) Rewrote patchset to order patches as (API, hw impl, usecase) - added API for new power management function emit_clk_levels This function should duplicate the functionality of print_clk_levels, but this solution passes the buffer base and write offset down the

[PATCH v3 2/3] amdgpu/pm: Implemention of emit_clk_levels for navi10 that accepts buffer base and write offset

2022-01-25 Thread Darren Powell
(v3) Rewrote patchset to order patches as (API, hw impl, usecase) - implement emit_clk_levels for navi10, based on print_clk_levels, but using sysfs_emit without smu_cmn_get_sysfs() workaround Signed-off-by: Darren Powell --- .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c |

[PATCH v5 3/4] drm/amd: Only run s3 or s0ix if system is configured properly

2022-01-25 Thread Mario Limonciello
This will cause misconfigured systems to not run the GPU suspend routines. * In APUs that are properly configured system will go into s2idle. * In APUs that are intended to be S3 but user selects s2idle the GPU will stay fully powered for the suspend. * In APUs that are intended to be s2idle

[PATCH v5 4/4] drm/amd: don't reset dGPUs that don't go through system S3

2022-01-25 Thread Mario Limonciello
dGPUs connected to Intel systems configured for suspend to idle will not necessarily have the power rails cut at suspend and resetting the GPU may lead to problematic behaviors. Fixes: 6dc8265f9803 ("drm/amdgpu: always reset the asic in suspend (v2)") Link:

[PATCH v5 2/4] drm/amd: add support to check whether the system is set to s3

2022-01-25 Thread Mario Limonciello
This will be used to help make decisions on what to do in misconfigured systems. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 17 + 2 files changed, 19 insertions(+) diff --git

[PATCH v5 1/4] drm/amd: Warn users about potential s0ix problems

2022-01-25 Thread Mario Limonciello
On some OEM setups users can configure the BIOS for S3 or S2idle. When configured to S3 users can still choose 's2idle' in the kernel by using `/sys/power/mem_sleep`. Before commit 6dc8265f9803 ("drm/amdgpu: always reset the asic in suspend (v2)"), the GPU would crash. Now when configured this

RE: [PATCH V2 2/7] drm/amd/pm: unify the interface for retrieving enabled ppfeatures

2022-01-25 Thread Quan, Evan
[AMD Official Use Only] > -Original Message- > From: Alex Deucher > Sent: Tuesday, January 25, 2022 11:58 PM > To: Quan, Evan > Cc: amd-gfx list ; Deucher, Alexander > ; Lazar, Lijo > Subject: Re: [PATCH V2 2/7] drm/amd/pm: unify the interface for retrieving > enabled ppfeatures > >

RE: [PATCH 2/2] drm/amd/pm: fix the deadlock observed on performance_level setting

2022-01-25 Thread Quan, Evan
[AMD Official Use Only] > -Original Message- > From: Alex Deucher > Sent: Tuesday, January 25, 2022 11:35 PM > To: Quan, Evan > Cc: amd-gfx list ; Deucher, Alexander > > Subject: Re: [PATCH 2/2] drm/amd/pm: fix the deadlock observed on > performance_level setting > > On Tue, Jan 25,

Re: [PATCH 3/4] drm/amdgpu: update algorithm of umc address conversion

2022-01-25 Thread Lazar, Lijo
On 1/25/2022 4:16 PM, Tao Zhou wrote: On ALDEBARAN, we need to traverse all column bits higher than BIT11(C4C3C2) in a row, the shift of R14 bit should be also taken into account. Retire all pages we find. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 41

[PATCH] display/amd: decrease message verbosity about watermarks table failure

2022-01-25 Thread Mario Limonciello
A number of BIOS versions have a problem with the watermarks table not being configured properly. This manifests as a very scary looking warning during resume from s0i3. This should be harmless in most cases and is well understood, so decrease the assertion to a clearer warning about the

[RFC v3 12/12] Revert 'drm/amdgpu: annotate a false positive recursive locking'

2022-01-25 Thread Andrey Grodzovsky
Since we have a single instance of reset semaphore which we lock only once even for XGMI hive we don't need the nested locking hint anymore. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 -- 1 file changed, 4 insertions(+), 10 deletions(-)

[RFC v3 07/12] drm/amdgpu: Drop concurrent GPU reset protection for device

2022-01-25 Thread Andrey Grodzovsky
Since now all GPU resets are serialzied there is no need for this. This patch also reverts 'drm/amdgpu: race issue when jobs on 2 ring timeout' Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 89 ++ 1 file

[RFC v3 11/12] drm/amdgpu: Rework amdgpu_device_lock_adev

2022-01-25 Thread Andrey Grodzovsky
This functions needs to be split into 2 parts where one is called only once for locking single instance of reset_domain's sem and reset flag and the other part which handles MP1 states should still be called for each device in XGMI hive. Signed-off-by: Andrey Grodzovsky ---

[RFC v3 10/12] drm/amdgpu: Move in_gpu_reset into reset_domain

2022-01-25 Thread Andrey Grodzovsky
We should have a single instance per entrire reset domain. Signed-off-by: Andrey Grodzovsky Suggested-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 7 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 +

[RFC v3 09/12] drm/amdgpu: Move reset sem into reset_domain

2022-01-25 Thread Andrey Grodzovsky
We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 -

[RFC v3 08/12] drm/amdgpu: Rework reset domain to be refcounted.

2022-01-25 Thread Andrey Grodzovsky
The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. Signed-off-by:

[RFC v3 06/12] drm/amdgpu: Drop hive->in_reset

2022-01-25 Thread Andrey Grodzovsky
Since we serialize all resets no need to protect from concurrent resets. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 1 -

[RFC v3 05/12] drm/amd/virt: For SRIOV send GPU reset directly to TDR queue.

2022-01-25 Thread Andrey Grodzovsky
No need to to trigger another work queue inside the work queue. v3: Problem: Extra reset caused by host side FLR notification following guest side triggered reset. Fix: Preven qeuing flr_work from mailbox irq if guest already executing a reset. Suggested-by: Liu Shaoyun Signed-off-by: Andrey

[RFC v3 04/12] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

2022-01-25 Thread Andrey Grodzovsky
Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to

[RFC v3 03/12] drm/amdgpu: Fix crash on modprobe

2022-01-25 Thread Andrey Grodzovsky
Restrict jobs resubmission to suspend case only since schedulers not initialised yet on probe. Signed-off-by: Andrey Grodzovsky Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git

[RFC v3 02/12] drm/amdgpu: Move scheduler init to after XGMI is ready

2022-01-25 Thread Andrey Grodzovsky
Before we initialize schedulers we must know which reset domain are we in - for single device there iis a single domain per device and so single wq per device. For XGMI the reset domain spans the entire XGMI hive and so the reset wq is per hive. Signed-off-by: Andrey Grodzovsky ---

[RFC v3 01/12] drm/amdgpu: Introduce reset domain

2022-01-25 Thread Andrey Grodzovsky
Defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another. Do it for both single device and XGMI hive cases. Signed-off-by: Andrey Grodzovsky Suggested-by: Daniel Vetter Suggested-by: Christian König Reviewed-by: Christian

[RFC v3 00/12] Define and use reset domain for GPU recovery in amdgpu

2022-01-25 Thread Andrey Grodzovsky
This patchset is based on earlier work by Boris[1] that allowed to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. On top of that I also serialized any GPU reset we trigger from within amdgpu code to also go through the same

Re: [PATCH] drm/amd/amdgpu: Add ip_discovery_text sysfs entry (v2)

2022-01-25 Thread Luben Tuikov
Inlined: On 2022-01-25 13:18, Tom St Denis wrote: > Newer hardware has a discovery table in hardware that the kernel will > rely on instead of header files for things like IP offsets. This > sysfs entry adds a simple to parse table of IP instances and segment > offsets. > > Produces output that

Re: [PATCH] drm/amdgpu/display: Remove t_srx_delay_us.

2022-01-25 Thread Alex Deucher
Applied. Thanks! Alex On Tue, Jan 25, 2022 at 12:53 PM Harry Wentland wrote: > > On 2022-01-22 21:38, Bas Nieuwenhuizen wrote: > > Unused. Convert the divisions into asserts on the divisor, to > > debug why it is zero. The divide by zero is suspected of causing > > kernel panics. > > > > While

Re: [PATCH 1/1] drm/amdkfd: Don't take process mutex for svm ioctls

2022-01-25 Thread Felix Kuehling
Am 2022-01-25 um 13:04 schrieb Philip Yang: SVM ioctls take proper svms->lock to handle race conditions, don't need take process mutex to serialize ioctls. This also fixes circular locking warning: WARNING: possible circular locking dependency detected Possible unsafe locking scenario:

[PATCH] drm/amd/amdgpu: Add ip_discovery_text sysfs entry (v2)

2022-01-25 Thread Tom St Denis
Newer hardware has a discovery table in hardware that the kernel will rely on instead of header files for things like IP offsets. This sysfs entry adds a simple to parse table of IP instances and segment offsets. Produces output that looks like: $ cat ip_discovery_text ATHUB{0} v2.0.0: 0c00

[PATCH 1/1] drm/amdkfd: Don't take process mutex for svm ioctls

2022-01-25 Thread Philip Yang
SVM ioctls take proper svms->lock to handle race conditions, don't need take process mutex to serialize ioctls. This also fixes circular locking warning: WARNING: possible circular locking dependency detected Possible unsafe locking scenario: CPU0CPU1

Re: [PATCH] drm/amdgpu/display: Remove t_srx_delay_us.

2022-01-25 Thread Harry Wentland
On 2022-01-22 21:38, Bas Nieuwenhuizen wrote: > Unused. Convert the divisions into asserts on the divisor, to > debug why it is zero. The divide by zero is suspected of causing > kernel panics. > > While I have no idea where the zero is coming from I think this > patch is a positive either way. >

Re: [PATCH v2 2/8] drm/amdkfd: Add KFD SMI event IDs and triggers

2022-01-25 Thread Felix Kuehling
Am 2022-01-20 um 18:13 schrieb Philip Yang: Define new system management interface event IDs, migration triggers and user queue eviction triggers, those will be implemented in the following patches. Signed-off-by: Philip Yang --- include/uapi/linux/kfd_ioctl.h | 27

Re: [PATCH] drm/amd/amdgpu: Add ip_discovery_text sysfs entry

2022-01-25 Thread Alex Deucher
On Tue, Jan 25, 2022 at 11:42 AM StDenis, Tom wrote: > > I literally brought this up in our initial discussion > > Frankly from umrs point of view a single file is easier. > > But I can't code anything until it's in the tree... yeah, the single file is arguably easier to deal with. We could

Re: [PATCH] drm/amd/amdgpu: Add ip_discovery_text sysfs entry

2022-01-25 Thread StDenis, Tom
I literally brought this up in our initial discussion Frankly from umrs point of view a single file is easier. But I can't code anything until it's in the tree... Tom From: Alex Deucher Sent: Tuesday, January 25, 2022 11:39 To: StDenis, Tom Cc:

Re: [PATCH] drm/amd/display: Add Missing HPO Stream Encoder Function Hook

2022-01-25 Thread Harry Wentland
On 2022-01-25 01:25, Fangzhi Zuo wrote: > [Why] > configure_dp_hpo_throttled_vcp_size() was missing promotion before, but it > was covered by > not calling the missing function hook in the old interface > hpo_dp_link_encoder->funcs. > > Recent refactor replaces with new caller

Re: [PATCH] drm/amd/amdgpu: Add ip_discovery_text sysfs entry

2022-01-25 Thread Alex Deucher
On Mon, Jan 24, 2022 at 1:07 PM Tom St Denis wrote: > > Newer hardware has a discovery table in hardware that the kernel will > rely on instead of header files for things like IP offsets. This > sysfs entry adds a simple to parse table of IP instances and segment > offsets. > > Produces output

Re: [PATCH v2 1/8] drm/amdkfd: Correct SMI event read size

2022-01-25 Thread Felix Kuehling
Am 2022-01-20 um 18:13 schrieb Philip Yang: sizeof(buf) is 8 bytes because it is defined as unsigned char *buf, each SMI event read only copy max 8 bytes to user buffer. Correct this by using the buf allocate size. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling ---

Re: [PATCH 2/2] drm/amdgpu/display: use msleep rather than udelay for long delays

2022-01-25 Thread Alex Deucher
On Tue, Jan 25, 2022 at 6:32 AM Lazar, Lijo wrote: > > > > On 1/20/2022 11:34 PM, Alex Deucher wrote: > > Some architectures (e.g., ARM) throw an compilation error if the > > udelay is too long. In general udelays of longer than 2000us are > > not recommended on any architecture. Switch to

Re: [PATCH] drm/amd/amdgpu: Add ip_discovery_text sysfs entry

2022-01-25 Thread Luben Tuikov
Patch is fine, if it does what you want. A few comments inline. On 2022-01-24 13:07, Tom St Denis wrote: > Newer hardware has a discovery table in hardware that the kernel will > rely on instead of header files for things like IP offsets. This > sysfs entry adds a simple to parse table of IP

Re: [PATCH V2 3/7] drm/amd/pm: update the data type for retrieving enabled ppfeatures

2022-01-25 Thread Alex Deucher
Reviewed-by: Alex Deucher On Tue, Jan 25, 2022 at 4:00 AM Evan Quan wrote: > > Use uint64_t instead of an array of uint32_t. This can avoid > some non-necessary intermediate uint32_t -> uint64_t conversions. > > Signed-off-by: Evan Quan > Change-Id: I4e217357203a23440f058d7e25f55eaebd15c5ef >

[PATCH 1/1] drm/amdgpu: Wipe all VRAM on free when RAS is enabled

2022-01-25 Thread Felix Kuehling
On GPUs with RAS, poison can propagate between processes if VRAM is not cleared when it is freed or allocated. The reason is, that not all write accesses clear RAS poison. 32-byte writes by the SDMA engine do clear RAS poison. Clearing memory in the background when it is freed should avoid major

Re: [PATCH V2 1/7] drm/amd/pm: correct the way for retrieving enabled ppfeatures on Renoir

2022-01-25 Thread Alex Deucher
Acked-by: Alex Deucher On Tue, Jan 25, 2022 at 4:00 AM Evan Quan wrote: > > As other dGPU asics, Renoir should use smu_cmn_get_enabled_mask() for > that job. > > Signed-off-by: Evan Quan > Change-Id: I9e845ba84dd45d0826506de44ef4760fa851a516 > --- > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 3

Re: [PATCH V2 2/7] drm/amd/pm: unify the interface for retrieving enabled ppfeatures

2022-01-25 Thread Alex Deucher
On Tue, Jan 25, 2022 at 4:00 AM Evan Quan wrote: > > Instead of having two which do the same thing. > > Signed-off-by: Evan Quan > Change-Id: I6302c9b5abdb999c4b7c83a0d1852181208b1c1f > --- > .../amd/pm/swsmu/smu11/cyan_skillfish_ppt.c | 2 +- > .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c

Re: Build regressions/improvements in v5.17-rc1

2022-01-25 Thread Thomas Bogendoerfer
On Mon, Jan 24, 2022 at 08:55:40AM +0100, Geert Uytterhoeven wrote: > > + /kisskb/src/lib/test_printf.c: error: "PTR" redefined [-Werror]: => > > 247:0, 247 > > + /kisskb/src/sound/pci/ca0106/ca0106.h: error: "PTR" redefined [-Werror]: > > => 62, 62:0 > > mips-gcc8/mips-allmodconfig >

Re: [PATCH 2/2] drm/amd/pm: fix the deadlock observed on performance_level setting

2022-01-25 Thread Alex Deucher
On Tue, Jan 25, 2022 at 3:57 AM Evan Quan wrote: > > The sub-routine(amdgpu_gfx_off_ctrl) tried to obtain the lock > adev->pm.mutex which was actually hold by amdgpu_dpm_force_performance_level. > A deadlock happened then. > > Signed-off-by: Evan Quan > Change-Id:

Re: [PATCH 2/2] drm/amdgpu/display: use msleep rather than udelay for long delays

2022-01-25 Thread Harry Wentland
On 2022-01-25 06:32, Lazar, Lijo wrote: > > > On 1/20/2022 11:34 PM, Alex Deucher wrote: >> Some architectures (e.g., ARM) throw an compilation error if the >> udelay is too long.  In general udelays of longer than 2000us are >> not recommended on any architecture.  Switch to msleep in these

Re: [PATCH] drm/amdgpu: Fix an error message in rmmod

2022-01-25 Thread Felix Kuehling
I have no objection to the change. It restores the sequence that was used before e9669fb78262. But I don't understand why GFX_OFF is causing a preemption error during module unload, but not when KFD is in normal use. Maybe it's because of the compute power profile that's normally set by

Re: [PATCH 1/2] drm/amd/pm: correct the MGpuFanBoost support for Beige Goby

2022-01-25 Thread Alex Deucher
Reviewed-by: Alex Deucher On Tue, Jan 25, 2022 at 3:57 AM Evan Quan wrote: > > The existing way cannot handle Beige Goby well as a different > PPTable data structure(PPTable_beige_goby_t instead of PPTable_t) > is used there. > > Signed-off-by: Evan Quan > Change-Id:

RE: Build regressions/improvements in v5.17-rc1

2022-01-25 Thread D, Lakshmi Sowjanya
> -Original Message- > From: Geert Uytterhoeven > Sent: Monday, January 24, 2022 1:26 PM > To: linux-ker...@vger.kernel.org > Cc: linuxppc-...@lists.ozlabs.org; sparcli...@vger.kernel.org; linux- > u...@lists.infradead.org; D, Lakshmi Sowjanya > ; k...@vger.kernel.org; linux- >

Re: [REGRESSION] Too-low frequency limit for AMD GPU PCI-passed-through to Windows VM

2022-01-25 Thread Lazar, Lijo
On 1/25/2022 5:28 AM, James Turner wrote: Hi Lijo, Not able to relate to how it affects gfx/mem DPM alone. Unless Alex has other ideas, would you be able to enable drm debug messages and share the log? Sure, I'm happy to provide drm debug messages. Enabling everything (0x1ff) generates *a

Re: [PATCH 2/2] drm/amdgpu/display: use msleep rather than udelay for long delays

2022-01-25 Thread Lazar, Lijo
On 1/20/2022 11:34 PM, Alex Deucher wrote: Some architectures (e.g., ARM) throw an compilation error if the udelay is too long. In general udelays of longer than 2000us are not recommended on any architecture. Switch to msleep in these cases. Signed-off-by: Alex Deucher ---

Re: [PATCH] drm/amdgpu: Fix an error message in rmmod

2022-01-25 Thread Wang, Yang(Kevin)
[AMD Official Use Only] the issue is introduced in following patch, so add following information is better. fixes: (e9669fb78262) drm/amdgpu: Add early fini callback Reviewed-by: Yang Wang Best Regards, Kevin From: amd-gfx on behalf of Tianci Yin Sent:

[PATCH 4/4] drm/amdgpu: get hash bit for CH4 in umc channel index

2022-01-25 Thread Tao Zhou
On ALDEBARAN, the umc channel bits are not original values, they are hashed. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 8 drivers/gpu/drm/amd/amdgpu/umc_v6_7.h | 15 +++ 2 files changed, 23 insertions(+) diff --git

[PATCH 2/4] drm/amdgpu: increase bad page number for umc ras query

2022-01-25 Thread Tao Zhou
One piece of umc normalizing address can be mapped to 16 pieces of physical address in each umc channel on ALDEBARAN. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 ++- drivers/gpu/drm/amd/amdgpu/umc_v6_7.h | 4 2 files changed, 6 insertions(+), 1 deletion(-) diff

[PATCH 3/4] drm/amdgpu: update algorithm of umc address conversion

2022-01-25 Thread Tao Zhou
On ALDEBARAN, we need to traverse all column bits higher than BIT11(C4C3C2) in a row, the shift of R14 bit should be also taken into account. Retire all pages we find. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/umc_v6_7.c | 41 +--

[PATCH 1/4] drm/amdgpu: add umc_fill_error_record to make code more simple

2022-01-25 Thread Tao Zhou
Create common amdgpu_umc_fill_error_record function for all versions of UMC and clean up related codes. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 23 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 21 +++

[PATCH] drm/amdgpu: Fix an error message in rmmod

2022-01-25 Thread Tianci Yin
From: "Tianci.Yin" [why] In rmmod procedure, kfd sends cp a dequeue request, but the request does not get response, then an error message "cp queue pipe 4 queue 0 preemption failed" printed. [how] Performing kfd suspending after disabling gfxoff can fix it. Change-Id:

RE: [PATCH 3/7] drm/amd/pm: drop the redundant 'supported' member of smu_feature structure

2022-01-25 Thread Quan, Evan
[AMD Official Use Only] > -Original Message- > From: Lazar, Lijo > Sent: Monday, January 24, 2022 1:03 PM > To: Quan, Evan ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Chen, Guchun > ; Huang, Ray > Subject: Re: [PATCH 3/7] drm/amd/pm: drop the redundant 'supported' >

[PATCH V2 7/7] drm/amd/pm: revise the implementation of smu_cmn_disable_all_features_with_exception

2022-01-25 Thread Evan Quan
As there is no internal cache for enabled ppfeatures now. Thus the 2nd parameter will be not needed any more. Signed-off-by: Evan Quan Change-Id: I0c1811f216c55d6ddfabdc9e099dc214c21bdf2e --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 9 ++---

[PATCH V2 6/7] drm/amd/pm: avoid consecutive retrieving for enabled ppfeatures

2022-01-25 Thread Evan Quan
As the enabled ppfeatures are just retrieved ahead. We can use that directly instead of retrieving again and again. Signed-off-by: Evan Quan Change-Id: I08827437fcbbc52084418c8ca6a90cfa503306a9 --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 10 +- 1 file changed, 9 insertions(+), 1

[PATCH V2 5/7] drm/amd/pm: drop the cache for enabled ppfeatures

2022-01-25 Thread Evan Quan
The following scenarios make the driver cache for enabled ppfeatures outdated and invalid: - Other tools interact with PMFW to change the enabled ppfeatures. - PMFW may enable/disable some features behind driver's back. E.g. for sienna_cichild, on gfxoff entering, PMFW will disable gfx

[PATCH V2 4/7] drm/amd/pm: correct the usage for 'supported' member of smu_feature structure

2022-01-25 Thread Evan Quan
The supported features should be retrieved just after EnableAllDpmFeatures message complete. And the check(whether some dpm feature is supported) is only needed when we decide to enable or disable it. Signed-off-by: Evan Quan Change-Id: I07c9a5ac5290cd0d88a40ce1768d393156419b5a ---

[PATCH V2 2/7] drm/amd/pm: unify the interface for retrieving enabled ppfeatures

2022-01-25 Thread Evan Quan
Instead of having two which do the same thing. Signed-off-by: Evan Quan Change-Id: I6302c9b5abdb999c4b7c83a0d1852181208b1c1f --- .../amd/pm/swsmu/smu11/cyan_skillfish_ppt.c | 2 +- .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 6 +- .../drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c | 6 +-

[PATCH V2 1/7] drm/amd/pm: correct the way for retrieving enabled ppfeatures on Renoir

2022-01-25 Thread Evan Quan
As other dGPU asics, Renoir should use smu_cmn_get_enabled_mask() for that job. Signed-off-by: Evan Quan Change-Id: I9e845ba84dd45d0826506de44ef4760fa851a516 --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git

Re: [REGRESSION] Too-low frequency limit for AMD GPU PCI-passed-through to Windows VM

2022-01-25 Thread James Turner
Hi Lijo, > Not able to relate to how it affects gfx/mem DPM alone. Unless Alex > has other ideas, would you be able to enable drm debug messages and > share the log? Sure, I'm happy to provide drm debug messages. Enabling everything (0x1ff) generates *a lot* of log messages, though. Is there a

Re: Build regressions/improvements in v5.17-rc1

2022-01-25 Thread Randy Dunlap
On 1/24/22 17:23, Felix Kuehling wrote: > > Am 2022-01-24 um 14:11 schrieb Randy Dunlap: >> On 1/24/22 10:55, Geert Uytterhoeven wrote: >>> Hi Alex, >>> >>> On Mon, Jan 24, 2022 at 7:52 PM Alex Deucher wrote: On Mon, Jan 24, 2022 at 5:25 AM Geert Uytterhoeven wrote: > On Sun,

[PATCH 2/2] drm/amd/pm: fix the deadlock observed on performance_level setting

2022-01-25 Thread Evan Quan
The sub-routine(amdgpu_gfx_off_ctrl) tried to obtain the lock adev->pm.mutex which was actually hold by amdgpu_dpm_force_performance_level. A deadlock happened then. Signed-off-by: Evan Quan Change-Id: Id692829381dedc6380f5464d74107d696f7abca1 --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 50

[PATCH 1/2] drm/amd/pm: correct the MGpuFanBoost support for Beige Goby

2022-01-25 Thread Evan Quan
The existing way cannot handle Beige Goby well as a different PPTable data structure(PPTable_beige_goby_t instead of PPTable_t) is used there. Signed-off-by: Evan Quan Change-Id: I02208c011e93c4d37769bd022e65e9084faa97e4 --- drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 6 +++--- 1