Re: [PATCH v4 00/10] Add PM policy interfaces

2024-05-15 Thread Lazar, Lijo
On 5/14/2024 4:35 PM, Lijo Lazar wrote: > This series adds APIs to get the supported PM policies and also set them. A PM > policy type is a predefined policy type supported by an SOC and each policy > may > define two or more levels to choose from. A user can select the appropriate > level

Re: [PATCH 2/2 v2] drm/amd/pm: check specific index for aldebaran

2024-05-14 Thread Lazar, Lijo
On 5/14/2024 12:28 PM, Jesse Zhang wrote: > To avoid warning problems, drop index and > use PPSMC_MSG_GfxDriverReset instead of index for aldebaran. > > Signed-off-by: Jesse Zhang > Suggested-by: Lijo Lazar > --- > drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 13 +++-- > 1

Re: [PATCH 2/2] drm/amd/pm: check specific index for aldebaran

2024-05-14 Thread Lazar, Lijo
On 5/14/2024 12:37 PM, Wang, Yang(Kevin) wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > -Original Message- > From: amd-gfx On Behalf Of Lazar, Lijo > Sent: Tuesday, May 14, 2024 2:07 PM > To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop

Re: [PATCH] drm/amdgpu/pm: Drop hard-code value of usTMax

2024-05-14 Thread Lazar, Lijo
On 5/14/2024 9:43 AM, Ma Jun wrote: > Drop hard-code value of nsTmax because we read this > value from fantable below. > > Signed-off-by: Ma Jun Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c | 2 -- > 1 file changed, 2

Re: [PATCH v2] drm/amdgpu: Fix the null pointer dereference to ras_manager

2024-05-14 Thread Lazar, Lijo
On 5/14/2024 9:42 AM, Ma Jun wrote: > Check ras_manager before using it > > Signed-off-by: Ma Jun Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git

Re: [PATCH 2/2] drm/amd/pm: check specific index for aldebaran

2024-05-14 Thread Lazar, Lijo
On 5/14/2024 11:34 AM, Jesse Zhang wrote: > To avoid warning problems, drop index and > use PPSMC_MSG_GfxDriverReset instead of index for aldebaran. > > Signed-off-by: Jesse Zhang > Suggested-by: Lijo Lazar > --- > drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 13 +++-- > 1

Re: [PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle

2024-05-13 Thread Lazar, Lijo
On 5/14/2024 6:30 AM, Ma, Jun wrote: > Hi Lijo & Kevin, thanks for review, will drop this patch > In the original function below check is there. if (!handle || !info || type >= ACA_ERROR_TYPE_COUNT) return -EINVAL; So moving this to a later stage is still valid.

Re: [PATCH 05/22] drm/amd/pm: check specific index for aldebaran

2024-05-13 Thread Lazar, Lijo
On 5/10/2024 8:20 AM, Jesse Zhang wrote: > Check for specific indexes that may be invalid values. > > Signed-off-by: Jesse Zhang > --- > drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git

Re: [PATCH 09/22] drm/amd/pm: check specific index for smu13

2024-05-13 Thread Lazar, Lijo
On 5/13/2024 4:27 PM, Lazar, Lijo wrote: > > > On 5/10/2024 8:20 AM, Jesse Zhang wrote: >> Check for specific indexes that may be invalid values. >> >> Signed-off-by: Jesse Zhang >> --- >> drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c

Re: [PATCH 09/22] drm/amd/pm: check specific index for smu13

2024-05-13 Thread Lazar, Lijo
On 5/10/2024 8:20 AM, Jesse Zhang wrote: > Check for specific indexes that may be invalid values. > > Signed-off-by: Jesse Zhang > --- > drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git

Re: [PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle

2024-05-13 Thread Lazar, Lijo
On 5/13/2024 2:26 PM, Ma Jun wrote: > Check handle pointer before using it > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c >

Re: [PATCH 2/5] drm/amdgpu: Fix the null pointer dereference to ras_manager

2024-05-13 Thread Lazar, Lijo
On 5/13/2024 2:26 PM, Ma Jun wrote: > Check ras_manager before using it > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c >

Re: [PATCH v3] drm/amdgpu: Add Ring Hang Events

2024-05-13 Thread Lazar, Lijo
On 5/13/2024 9:44 AM, Ori Messinger wrote: > This patch adds 'ring hang' events to the driver. > This is done by adding a 'reset_ring_hang' bool variable to the > struct 'amdgpu_reset_context' in the amdgpu_reset.h file. > The purpose for this 'reset_ring_hang' variable is whenever a GPU >

Re: [PATCH] drm/amdkfd: Ensure gpu_id is unique

2024-05-10 Thread Lazar, Lijo
On 5/10/2024 1:36 AM, Harish Kasiviswanathan wrote: > gpu_id needs to be unique for user space to identify GPUs via KFD > interface. In the current implementation there is a very small > probability of having non unique gpu_ids. > > v2: Add check to confirm if gpu_id is unique. If not unique,

Re: [PATCH 19/22 V2] drm/amdgpu: Fix the warning division or modulo by zero for the variable num_xcc_per_xcp

2024-05-10 Thread Lazar, Lijo
On 5/10/2024 1:56 PM, Jesse Zhang wrote: > Checks the partition mode and returns an error for an invalid mode. > > Signed-off-by: Jesse Zhang > Suggested-by: Lijo Lazar > --- > drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git

Re: [PATCH 19/22] drm/amdgpu: Fix the warning division or modulo by zero for the variable num_xcc_per_xcp

2024-05-10 Thread Lazar, Lijo
On 5/10/2024 1:09 PM, Zhang, Jesse(Jie) wrote: > [AMD Official Use Only - General] > > Hi Lijo, > > -Original Message- > From: amd-gfx On Behalf Of Lazar, Lijo > Sent: Friday, May 10, 2024 3:16 PM > To: amd-gfx@lists.freedesktop.org > Subject: Re: [P

Re: [PATCH 19/22] drm/amdgpu: Fix the warning division or modulo by zero for the variable num_xcc_per_xcp

2024-05-10 Thread Lazar, Lijo
On 5/10/2024 8:20 AM, Jesse Zhang wrote: > Dividing expression num_xcc_per_xcp which may be zero has undefined behavior. > > Signed-off-by: Jesse Zhang > --- > drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git

Re: [PATCH] drm/amdgpu: Fix truncation by resizing ucode_prefix in imu_v12_0_init_microcode

2024-05-08 Thread Lazar, Lijo
On 5/7/2024 10:14 PM, Srinivasan Shanmugam wrote: > This commit fixes potential truncation when writing the string _imu.bin > into the fw_name buffer in the imu_v12_0_init_microcode function in the > imu_v12_0.c file > > The ucode_prefix size was reduced from 30 to 15 to ensure the snprintf >

Re: [PATCH] drm/amdgpu: Fix buffer size to prevent truncation in gfx_v12_0_init_microcode

2024-05-08 Thread Lazar, Lijo
On 5/7/2024 10:14 PM, Srinivasan Shanmugam wrote: > This commit addresses multiple warnings in the gfx_v12_0_init_microcode > function in the gfx_v12_0.c file. The warnings were related to potential > truncation when writing the strings _pfp.bin, _me.bin, _rlc.bin, and > _mec.bin into the

Re: [PATCH] drm/amdkfd: Ensure gpu_id is unique

2024-05-06 Thread Lazar, Lijo
On 5/4/2024 3:36 AM, Harish Kasiviswanathan wrote: > gpu_id needs to be unique for user space to identify GPUs via KFD > interface. In the current implementation there is a very small > probability of having non unique gpu_ids. > > v2: Add check to confirm if gpu_id is unique. If not unique,

Re: [PATCH] drm/amdgpu: Assign correct bits for SDMA HDP flush

2024-05-06 Thread Lazar, Lijo
On 5/7/2024 6:00 AM, Harry Wentland wrote: > This patch is causing crashes of Manor Lords on my Navi 21 on the 6.8.9 > stable kernel. It leads to an assertion failure in wine: > > File: ../src-wine/dlls/winevulkan/loader_thunks.c > Line: 3621 > > Expression "!status && vkEndCommandBuffer"" >

Re: [PATCH] drm/amd/amdgpu: Check tbo resource pointer

2024-05-02 Thread Lazar, Lijo
On 5/2/2024 7:01 PM, Asad Kamal wrote: > Validate tbo resource pointer, skip if NULL > > Signed-off-by: Asad Kamal > Reviewed-by: Christian König Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++- > 1 file changed, 2 insertions(+), 1

Re: [PATCH v3 1/4] drm/amdgpu: Fix two reset triggered in a row

2024-04-30 Thread Lazar, Lijo
On 4/26/2024 9:27 AM, Yunxiang Li wrote: > Some times a hang GPU causes multiple reset sources to schedule resets. > The second source will be able to trigger an unnecessary reset if they > schedule after we call amdgpu_device_stop_pending_resets. > > Move amdgpu_device_stop_pending_resets to

Re: [PATCH] drm/amdgpu: avoid reading vf2pf info size from FB

2024-04-30 Thread Lazar, Lijo
On 4/30/2024 7:53 PM, Zhigang Luo wrote: > VF can't access FB when host is doing mode1 reset. Using sizeof to get > vf2pf info size, instead of reading it from vf2pf header stored in FB. > > Signed-off-by: Zhigang Luo > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +- > 1 file changed,

Re: [PATCH] drm/amdgpu: add mutex to protect ras shared memory

2024-04-29 Thread Lazar, Lijo
On 4/28/2024 12:38 PM, YiPeng Chai wrote: > Add mutex to protect ras shared memory. > > Signed-off-by: YiPeng Chai > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 121 ++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h| 1 + >

Re: [PATCH] drm/amdgpu: skip ip dump if devcoredump flag is set

2024-04-25 Thread Lazar, Lijo
On 4/25/2024 3:53 PM, Sunil Khatri wrote: > Do not dump the ip registers during driver reload > in passthrough environment. > > Signed-off-by: Sunil Khatri > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git

Re: [PATCH v2 2/2] drm/amdgpu: Fix the uninitialized variable warning

2024-04-25 Thread Lazar, Lijo
On 4/25/2024 3:30 PM, Ma Jun wrote: > Initialize the phy_id to 0 to fix the warning of > "Using uninitialized value phy_id" > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git

Re: [PATCH v2] drm/amdgpu: Fix buffer size in gfx_v9_4_3_init_ cp_compute_microcode() and rlc_microcode()

2024-04-25 Thread Lazar, Lijo
On 4/25/2024 12:05 PM, Srinivasan Shanmugam wrote: > The function gfx_v9_4_3_init_microcode in gfx_v9_4_3.c was generating > about potential truncation of output when using the snprintf function. > The issue was due to the size of the buffer 'ucode_prefix' being too > small to accommodate the

Re: [PATCH] drm/amdgpu: Fix buffer size in gfx_v9_4_3_init_ cp_compute_microcode() and rlc_microcode()

2024-04-24 Thread Lazar, Lijo
On 4/23/2024 7:13 AM, Srinivasan Shanmugam wrote: > The buffer size is determined by the declaration char fw_name[30]; This > means fw_name can hold up to 30 characters, including the null character > that marks the end of the string. > > The string to be written is "amdgpu/%s_mec.bin" or

Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-22 Thread Lazar, Lijo
On 4/23/2024 1:15 AM, Yunxiang Li wrote: > Reset request from KFD is missing a check for if a reset is already in > progress, this causes a second reset to be triggered right after the > previous one finishes. Add the check to align with the other reset sources. > > Signed-off-by: Yunxiang Li

Re: [PATCH v4] drm/amdgpu: Fix snprintf buffer size in smu_v14_0_init_microcode

2024-04-22 Thread Lazar, Lijo
On 4/19/2024 9:14 PM, Srinivasan Shanmugam wrote: > This commit addresses buffer overflow in the smu_v14_0_init_microcode > function. The issue was about the snprintf function writing more bytes > into the fw_name buffer than it can hold. > > The line of code is: > > snprintf(fw_name,

Re: [PATCH] drm/amdgpu: fix use-after-free issue

2024-04-22 Thread Lazar, Lijo
On 4/22/2024 4:52 PM, Christian König wrote: > Am 22.04.24 um 11:37 schrieb Lazar, Lijo: >> >> On 4/22/2024 2:59 PM, Christian König wrote: >>> Am 22.04.24 um 10:47 schrieb Jack Xiao: >>>> Delete fence fallback timer to fix the ramdom >>>&g

Re: [PATCH v2] drm/amdgpu/mes: fix use-after-free issue

2024-04-22 Thread Lazar, Lijo
On 4/22/2024 3:09 PM, Jack Xiao wrote: > Delete fence fallback timer to fix the ramdom > use-after-free issue. > > v2: move to amdgpu_mes.c > > Signed-off-by: Jack Xiao Acked-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 1 + > 1 file changed, 1

Re: [PATCH] drm/amdgpu: fix use-after-free issue

2024-04-22 Thread Lazar, Lijo
On 4/22/2024 2:59 PM, Christian König wrote: > Am 22.04.24 um 10:47 schrieb Jack Xiao: >> Delete fence fallback timer to fix the ramdom >> use-after-free issue. > > That's already done in amdgpu_fence_driver_hw_fini() and absolutely > shouldn't be in amdgpu_ring_fini(). > > And the

Re: [PATCH] drm/amdgpu: init microcode chip name from ip versions

2024-04-22 Thread Lazar, Lijo
On 4/22/2024 11:23 AM, Le Ma wrote: > To adapt to different gc versions in gfx_v9_4_3.c file. > > Change-Id: Ib4465aade0dcbbcc43318c6dc865f813c5411097 > Signed-off-by: Le Ma > Reviewed-by: Hawking Zhang Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c

Re: [PATCH] drm/amdgpu: Update CGCG settings for GFXIP 9.4.3

2024-04-21 Thread Lazar, Lijo
On 4/22/2024 6:42 AM, Rajneesh Bhardwaj wrote: > Tune coarse grain clock gating idle threshold and rlc idle timeout to > achieve better kernel launch latency. > > Signed-off-by: Rajneesh Bhardwaj Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 8

RE: [PATCH v5 2/6] drm/amdgpu: add support of gfx10 register dump

2024-04-17 Thread Lazar, Lijo
). If that happens, then the purpose of the patch - to get the context of a device hang - is lost. We may not even get a proper dmesg log. Thanks, Lijo -Original Message- From: Khatri, Sunil Sent: Wednesday, April 17, 2024 9:42 PM To: Lazar, Lijo ; Alex Deucher ; Khatri, Sunil Cc

Re: [PATCH v5 2/6] drm/amdgpu: add support of gfx10 register dump

2024-04-17 Thread Lazar, Lijo
On 4/17/2024 9:21 PM, Alex Deucher wrote: > On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote: >> >> Adding gfx10 gc registers to be used for register >> dump via devcoredump during a gpu reset. >> >> Signed-off-by: Sunil Khatri > > Reviewed-by: Alex Deucher > >> --- >>

Re: [PATCH] drm/amdgpu/pm: Remove gpu_od if it's an empty directory

2024-04-17 Thread Lazar, Lijo
On 4/17/2024 11:23 AM, Ma Jun wrote: > gpu_od should be removed if it's an empty directory > > Signed-off-by: Ma Jun > Reported-by: Yang Wang > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c >

Re: [PATCH] drm/amdgpu/pm: Print od status info

2024-04-17 Thread Lazar, Lijo
On 4/17/2024 3:10 PM, Ma Jun wrote: > Print the od status info if it's not supported. > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c > b/drivers/gpu/drm/amd/pm/amdgpu_pm.c >

Re: [PATCH v2] drm/amdgpu: Skip the coredump collection on reset during driver reload

2024-04-17 Thread Lazar, Lijo
On 4/17/2024 1:14 PM, Khatri, Sunil wrote: > > On 4/17/2024 1:06 PM, Khatri, Sunil wrote: >> devcoredump is used to debug gpu hangs/resets. So in normal process >> when there is a hang due to ring timeout or page fault we are doing a >> hard reset as soft reset fail in those cases. How are we

Re: [PATCH v2] drm/amdgpu: Skip the coredump collection on reset during driver reload

2024-04-16 Thread Lazar, Lijo
On 4/17/2024 9:43 AM, Ahmad Rehman wrote: > In passthrough environment, the driver triggers the mode-1 reset on > reload. The reset causes the core dump collection which is delayed task > and prevents driver from unloading until it is completed. Since we do > not need to collect data on "reset

Re: [PATCH] drm/amdgpu: Skip the coredump collection on reset during driver reload

2024-04-16 Thread Lazar, Lijo
On 4/17/2024 12:05 AM, Ahmad Rehman wrote: > In passthrough environment, the driver triggers the mode-1 reset on > reload. The reset causes the core dump collection which is delayed task > and prevents driver from unloading until it is completed. Since we do > not need to collect data on "reset

RE: [PATCH] drm/amdgpu: Use driver mode reset for data poison

2024-04-16 Thread Lazar, Lijo
[Public] Is this applicable for aldebaran also? Thanks, Lijo -Original Message- From: amd-gfx On Behalf Of Hawking Zhang Sent: Tuesday, April 16, 2024 11:46 AM To: amd-gfx@lists.freedesktop.org; Zhou1, Tao Cc: Zhang, Hawking Subject: [PATCH] drm/amdgpu: Use driver mode reset for data

Re: [PATCH v2] drm/amdgpu: refactoring the runtime pm mode detection code

2024-04-15 Thread Lazar, Lijo
On 4/3/2024 8:27 AM, Ma Jun wrote: > refactor the code of runtime pm mode detection to support > amdgpu_runtime_pm =2 and 1 two cases > > Signed-off-by: Ma Jun > Reviewed-by: Yang Wang Reviewed-by: Lijo Lazar Thanks, Lijo > --- > v1->v2: > - Fix logic and output info (Lijo) > - Fix code

Re: [PATCH] drm/amdgpu/gfx11: properly handle regGRBM_GFX_CNTL in soft reset

2024-04-11 Thread Lazar, Lijo
On 4/8/2024 10:50 PM, Alex Deucher wrote: > Need to take the srbm_mutex and while we are here, use the > helper function soc21_grbm_select(); > > Signed-off-by: Alex Deucher Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 15 +-- > 1 file

Re: [PATCH 2/2] amd/amdgpu: improve VF recover time

2024-04-04 Thread Lazar, Lijo
On 4/3/2024 11:42 PM, Zhigang Luo wrote: > 1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. > 2. set fatel error detected flag. > > Signed-off-by: Zhigang Luo Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + >

Re: [PATCH 1/2] drm/amd/pm: Report uclk and sclk limit

2024-04-02 Thread Lazar, Lijo
On 4/2/2024 4:00 PM, Lazar, Lijo wrote: > > > On 4/2/2024 3:52 PM, Asad Kamal wrote: >> Report max set uclk and sclk for smu_v_13_0_6 >> > > You may rephrase as > > "Use OD (pp_od_clk_voltage) interface to report current limits, default >

Re: [PATCH 2/2] drm/amd/pm: Update uclk/sclk limit report format

2024-04-02 Thread Lazar, Lijo
On 4/2/2024 3:52 PM, Asad Kamal wrote: > Update max set uclk and sclk reporting format for smu_v_13_0_0 > Use aldebaran instead of smu v13.0.0 - both are different. You may also add the description similar to patch 1. With those updates, Reviewed-by: Lijo Lazar Thanks, Lijo >

Re: [PATCH 1/2] drm/amd/pm: Report uclk and sclk limit

2024-04-02 Thread Lazar, Lijo
On 4/2/2024 3:52 PM, Asad Kamal wrote: > Report max set uclk and sclk for smu_v_13_0_6 > You may rephrase as "Use OD (pp_od_clk_voltage) interface to report current limits, default or those set by user, for SCLK and UCLK." Thanks, Lijo > Signed-off-by: Asad Kamal > --- >

Re: [PATCH] drm/amdgpu: refactoring the runtime pm mode detection code

2024-04-01 Thread Lazar, Lijo
On 3/29/2024 1:58 PM, Ma Jun wrote: > refactor the code of runtime pm mode detection to support > amdgpu_runtime_pm =2 and 1 two cases > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 68 ++

Re: [PATCH v2 2/4] drm/amd/pm: Add PMFW message and capability flags

2024-04-01 Thread Lazar, Lijo
On 4/1/2024 4:45 PM, Kamal, Asad wrote: > [AMD Official Use Only - General] > > -Original Message- > From: amd-gfx On Behalf Of Lijo Lazar > Sent: Thursday, March 28, 2024 8:06 AM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Deucher, Alexander > ; Wang, Yang(Kevin) >

Re: [PATCH v3 1/5] drm/amdgpu: Add a new runtime mode definition

2024-03-28 Thread Lazar, Lijo
On 3/27/2024 4:40 PM, Ma Jun wrote: > Add a new runtime pm mode AMDGPU_RUNPM_BAMACO > and related macro definition > > Signed-off-by: Ma Jun Series is Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 4 > 1 file changed, 4 insertions(+) > >

Re: [PATCH] drm/amdgpu: Reset dGPU if suspend got aborted

2024-03-27 Thread Lazar, Lijo
On 3/28/2024 8:49 AM, Wang, Yang(Kevin) wrote: > [AMD Official Use Only - General] > > -Original Message- > From: amd-gfx On Behalf Of Lijo Lazar > Sent: Thursday, March 28, 2024 11:06 AM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Deucher, Alexander > > Subject:

Re: [PATCH v2 2/4] drm/amd/pm: Add PMFW message and capability flags

2024-03-27 Thread Lazar, Lijo
On 3/28/2024 8:57 AM, Wang, Yang(Kevin) wrote: > [AMD Official Use Only - General] > > -Original Message- > From: amd-gfx On Behalf Of Lijo Lazar > Sent: Thursday, March 28, 2024 10:36 AM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Deucher, Alexander > ; Wang,

Re: [PATCH] drm/amdgpu: make amdgpu device attr_update() function more efficient

2024-03-27 Thread Lazar, Lijo
On 3/26/2024 2:32 PM, Yang Wang wrote: > add a new enumeration type to identify device attribute node, > this method is relatively more efficient compared with 'strcmp' in > update_attr() function. > > Signed-off-by: Yang Wang > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 4 +-- >

Re: [PATCH 1/3] amd/amdkfd: add a function to wait no process running in kfd

2024-03-26 Thread Lazar, Lijo
On 3/23/2024 1:27 AM, Zhigang Luo wrote: > Signed-off-by: Zhigang Luo > Change-Id: I2a98d513c26107ac76ecf20e951c188afbc7ede6 > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 20 > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 5 - >

Re: [PATCH 4/5] drm/amdgpu: Add support for BAMACO mode checking

2024-03-26 Thread Lazar, Lijo
On 3/26/2024 2:59 PM, Lazar, Lijo wrote: > > > On 3/25/2024 3:45 PM, Ma Jun wrote: >> Optimize the code to add support for BAMACO mode checking >> >> Signed-off-by: Ma Jun >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +- >>

Re: [PATCH 4/5] drm/amdgpu: Add support for BAMACO mode checking

2024-03-26 Thread Lazar, Lijo
On 3/25/2024 3:45 PM, Ma Jun wrote: > Optimize the code to add support for BAMACO mode checking > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 74 +++-- >

Re: [PATCH 3/5] drm/amdgpu/pm: Add support for MACO flag checking

2024-03-26 Thread Lazar, Lijo
On 3/25/2024 3:45 PM, Ma Jun wrote: > Add support for MACO flag checking. > MACO mode only works if BACO is supported. > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h| 4 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- >

Re: [PATCH] drm/amdgpu: Fix truncation in smu_v11_0_init_microcode

2024-03-22 Thread Lazar, Lijo
On 3/22/2024 12:33 PM, Srinivasan Shanmugam wrote: > Reducing the size of ucode_prefix to 25 in the smu_v11_0_init_microcode > function. we ensure that fw_name can accommodate the maximum possible > string size > > Fixes the below with gcc W=1: >

Re: [PATCH] drm/amdgpu: Fix truncation in gfx_v10_0_init_microcode

2024-03-22 Thread Lazar, Lijo
On 3/22/2024 12:24 PM, Srinivasan Shanmugam wrote: > The total size of the fw_name buffer is 8 (for "amdgpu/") + 30 (for > ucode_prefix) + 5 (for "_pfp") + 5 (for "_wks") + 5 (for ".bin") = 53 > characters. > > Fixes the below with gcc W=1: > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c: In function

Re: [PATCH] drm/amdgpu: Fix truncations in gfx_v11_0_init_microcode()

2024-03-22 Thread Lazar, Lijo
On 3/22/2024 12:02 PM, Srinivasan Shanmugam wrote: > Reducing the size of ucode_prefix to 25 in the gfx_v11_0_init_microcode > function. This would ensure that the total number of characters being > written into fw_name does not exceed its size of 40. > > Fixes the below with gcc W=1: >

Re: [PATCH] drm/amdgpu: Fix truncation issues in gfx_v9_0.c

2024-03-22 Thread Lazar, Lijo
On 3/22/2024 11:54 AM, Srinivasan Shanmugam wrote: > The size of fw_name is increased to ensure that it can accommodate > the maximum possible size of the string being written into it. > > Fixes the below with gcc W=1: > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c: In function ‘gfx_v9_0_early_init’:

Re: [PATCH] drm/amdgpu: Fix 'fw_name' buffer size to prevent truncations in amdgpu_mes_init_microcode

2024-03-21 Thread Lazar, Lijo
On 3/21/2024 11:16 AM, Srinivasan Shanmugam wrote: > The snprintf function is used to write a formatted string into fw_name. > The format of the string is "amdgpu/%s_mes%s.bin", where %s is replaced > by the string in ucode_prefix and the second %s is replaced by either > "_2" or "1" depending

Re: [PATCH v2] drm/amdgpu: Fix format character cut-off issues in amdgpu_vcn_early_init()

2024-03-21 Thread Lazar, Lijo
On 3/21/2024 10:29 AM, Srinivasan Shanmugam wrote: > Reducing the size of ucode_prefix to 25 in the amdgpu_vcn_early_init > function. This would ensure that the total number of characters being > written into fw_name does not exceed its size of 40. > > Fixes the below with gcc W=1: >

Re: [PATCH] drm/amdgpu: Fix the runtime pm mode error

2024-03-21 Thread Lazar, Lijo
On 3/21/2024 12:28 PM, Ma, Jun wrote: > > > On 3/20/2024 9:38 PM, Lazar, Lijo wrote: >> >> >> On 3/20/2024 6:54 PM, Alex Deucher wrote: >>> On Wed, Mar 20, 2024 at 6:17 AM Ma Jun wrote: >>>> >>>> Because of the logic error,

Re: [PATCH] drm/amdgpu: Fix format character cut-off issues in amdgpu_vcn_early_init()

2024-03-20 Thread Lazar, Lijo
On 3/20/2024 8:28 PM, SRINIVASAN SHANMUGAM wrote: > > On 3/20/2024 3:12 PM, Lazar, Lijo wrote: >> >> On 3/20/2024 2:15 PM, Srinivasan Shanmugam wrote: >>> The issue was present in the lines where 'fw_name' was being formatted. >>> This fix ensures that the o

Re: [PATCH] drm/amdgpu: Fix the runtime pm mode error

2024-03-20 Thread Lazar, Lijo
On 3/20/2024 6:54 PM, Alex Deucher wrote: > On Wed, Mar 20, 2024 at 6:17 AM Ma Jun wrote: >> >> Because of the logic error, Arcturus and vega20 currently >> use the AMDGPU_RUNPM_NONE for runtime pm even though they >> support BACO. So, the code is optimized to fix this error. >> >>

Re: [PATCH] drm/amdgpu: Fix format character cut-off issues in amdgpu_vcn_early_init()

2024-03-20 Thread Lazar, Lijo
On 3/20/2024 2:15 PM, Srinivasan Shanmugam wrote: > The issue was present in the lines where 'fw_name' was being formatted. > This fix ensures that the output is not truncated > > Fixes the below with gcc W=1: > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c: In function ‘amdgpu_vcn_early_init’: >

Re: [PATCH] drm/amdgpu: refactor code to reuse system information

2024-03-19 Thread Lazar, Lijo
On 3/19/2024 7:27 PM, Khatri, Sunil wrote: > > On 3/19/2024 7:19 PM, Lazar, Lijo wrote: >> >> On 3/19/2024 6:02 PM, Sunil Khatri wrote: >>> Refactor the code so debugfs and devcoredump can reuse >>> the common information and avoid unnecessary copy of it. &g

Re: [PATCH] drm/amdgpu: refactor code to reuse system information

2024-03-19 Thread Lazar, Lijo
On 3/19/2024 6:02 PM, Sunil Khatri wrote: > Refactor the code so debugfs and devcoredump can reuse > the common information and avoid unnecessary copy of it. > > created a new file which would be the right place to > hold functions which will be used between sysfs, debugfs > and devcoredump. >

RE: [PATCH] drm/amdgpu: Fix truncation issues in smu_v13_0_init_microcode

2024-03-17 Thread Lazar, Lijo
[Public] Reviewed-by: Lijo Lazar Thanks, Lijo -Original Message- From: SHANMUGAM, SRINIVASAN Sent: Saturday, March 16, 2024 10:20 PM To: Koenig, Christian ; Deucher, Alexander Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN ; Lazar, Lijo Subject: [PATCH] drm/amdgpu: Fix

Re: [PATCH 1/1] drm/amdgpu: drop setting buffer funcs in sdma442

2024-03-15 Thread Lazar, Lijo
On 3/15/2024 5:45 PM, Ma, Le wrote: > [AMD Official Use Only - General] > >   >   >> -Original Message----- >> From: Lazar, Lijo <_Lijo.Lazar@amd.com_ <mailto:lijo.la...@amd.com>> >> Sent: Friday, March 15, 2024 6:14 PM >> To: Ma, Le <_L

Re: [PATCH] drm/amdgpu: trigger flr_work if reading pf2vf data failed

2024-03-15 Thread Lazar, Lijo
On 3/14/2024 10:24 PM, Zhigang Luo wrote: > if reading pf2vf data failed 5 times continuously, it means something is > wrong. Need to trigger flr_work to recover the issue. > > also use dev_err to print the error message to get which device has > issue and add warning message if waiting

Re: [PATCH 1/1] drm/amdgpu: drop setting buffer funcs in sdma442

2024-03-15 Thread Lazar, Lijo
On 3/15/2024 3:43 PM, Lazar, Lijo wrote: > > > On 3/15/2024 2:46 PM, Le Ma wrote: >> To fix the entity rq NULL issue. This setting has been moved to upper level. >> > > Need to call amdgpu_ttm_set_buffer_funcs_status(adev, true/false) in > mode-2 reset handlers

Re: [PATCH 1/1] drm/amdgpu: drop setting buffer funcs in sdma442

2024-03-15 Thread Lazar, Lijo
On 3/15/2024 2:46 PM, Le Ma wrote: > To fix the entity rq NULL issue. This setting has been moved to upper level. > Need to call amdgpu_ttm_set_buffer_funcs_status(adev, true/false) in mode-2 reset handlers as well. Thanks, Lijo > Fixes b70438004a14 ("drm/amdgpu: move buffer funcs setting

Re: [PATCH v2 1/2] drm/amd/pm: Update SMUv13.0.6 PMFW headers

2024-03-15 Thread Lazar, Lijo
On 3/15/2024 1:13 PM, Asad Kamal wrote: > Update PMFW interface headers for updated metrics table > with pcie link speed and pcie link width > > Signed-off-by: Asad Kamal Series is - Reviewed-by: Lijo Lazar Thanks, Lijo > --- >

Re: [PATCH 2/2] drm/amd/pm: Use metric table for pcie speed/width

2024-03-15 Thread Lazar, Lijo
On 3/15/2024 11:11 AM, Asad Kamal wrote: > Report pcie link speed/width using metric table in case > of one vf & if pmfw support is available, else report directly from > registers in case of pf. Skip reporting it for other cases. > > Signed-off-by: Asad Kamal > --- >

Re: [PATCH 1/9] drm/amd/pm: Add support for DPM policies

2024-03-13 Thread Lazar, Lijo
This one is missing some NULL checks. Will send a v2. Thanks, Lijo On 3/13/2024 4:32 PM, Lijo Lazar wrote: > Add support to set/get information about different DPM policies. The > support is only available on SOCs which use swsmu architecture. > > A DPM policy type may be defined with different

Re: [PATCH] drm/amdgpu: Do a basic health check before reset

2024-03-13 Thread Lazar, Lijo
On 3/14/2024 1:19 AM, Felix Kuehling wrote: > > On 2024-03-13 5:41, Lijo Lazar wrote: >> Check if the device is present in the bus before trying to recover. It >> could be that device itself is lost from the bus in some hang >> situations. >> >> Signed-off-by: Lijo Lazar >> --- >>  

Re: [PATCH] drm/amdgpu/pm: Fix the ppfeature value

2024-03-12 Thread Lazar, Lijo
On 3/13/2024 8:15 AM, Ma, Jun wrote: > > > On 3/12/2024 8:57 PM, Lazar, Lijo wrote: >> >> >> On 3/12/2024 4:29 PM, Ma Jun wrote: >>> Sometimes user may want to enable the od feature >>> by setting ppfeaturemask when loading amdgpu driver. >>&

Re: [PATCH] drm/amdgpu/pm: Fix the ppfeature value

2024-03-12 Thread Lazar, Lijo
On 3/12/2024 4:29 PM, Ma Jun wrote: > Sometimes user may want to enable the od feature > by setting ppfeaturemask when loading amdgpu driver. > However,not all Asics support this feature. > So we need to restore the ppfeature value and print > a warning info. > > Signed-off-by: Ma Jun > --- >

Re: [PATCH] drm/amdgpu: Handle duplicate BOs during process restore

2024-03-11 Thread Lazar, Lijo
On 3/8/2024 10:17 PM, Felix Kuehling wrote: > On 2024-03-08 11:22, Mukul Joshi wrote: >> In certain situations, some apps can import a BO multiple times >> (through IPC for example). To restore such processes successfully, >> we need to tell drm to ignore duplicate BOs. >> While at it, also add

Re: [PATCH v2] drm/amdgpu/pm: Fix NULL pointer dereference when get power limit

2024-03-08 Thread Lazar, Lijo
On 3/8/2024 3:21 PM, Ma Jun wrote: > Because powerplay_table initialization is skipped under > sriov case, We set default lower and upper OD value to > avoid NULL pointer issue. > > Also, It's necessary to check od capability before > using the power limit value from powerplay_table. > >

Re: [PATCH] drm/amdgpu/pm: Fix NULL pointer dereference when set/get power limit

2024-03-06 Thread Lazar, Lijo
On 3/7/2024 7:42 AM, Ma, Jun wrote: > Hi Lijo, > > On 3/6/2024 7:16 PM, Lazar, Lijo wrote: >> >> >> On 3/6/2024 3:56 PM, Ma Jun wrote: >>> Because powerplay_table initialization is skipped under >>> sriov case, We set default lower and u

Re: [PATCH] drm/amdgpu/pm: Fix NULL pointer dereference when set/get power limit

2024-03-06 Thread Lazar, Lijo
On 3/6/2024 3:56 PM, Ma Jun wrote: > Because powerplay_table initialization is skipped under > sriov case, We set default lower and upper OD value to > avoid NULL pointer issue. pp_od_clk_voltage is not enabled in SRIOV (except for GC 9.4.3 one VF mode). Since the interface is not available

Re: [PATCH 1/2] drm/amdgpu: Add HDP flush to VCN v4.0.3

2024-03-05 Thread Lazar, Lijo
On 3/5/2024 2:44 PM, Christian König wrote: > Am 05.03.24 um 10:01 schrieb Lazar, Lijo: >> On 3/5/2024 2:22 PM, Christian König wrote: >>> Am 05.03.24 um 07:40 schrieb Lijo Lazar: >>>> VCN 4.0.3 cannot trigger HDP flush with RRMT enabled. Instead, trigger >&g

Re: [PATCH 2/2] drm/amdgpu: Use offsets local to VCN in VF

2024-03-05 Thread Lazar, Lijo
On 3/5/2024 2:48 PM, Christian König wrote: > Am 05.03.24 um 10:03 schrieb Lazar, Lijo: >> >> On 3/5/2024 2:24 PM, Christian König wrote: >>> >>> Am 05.03.24 um 07:40 schrieb Lijo Lazar: >>>> For VCN 4.0.3, use only the local addressing scheme wh

Re: [PATCH 2/2] drm/amdgpu: Use offsets local to VCN in VF

2024-03-05 Thread Lazar, Lijo
On 3/5/2024 2:24 PM, Christian König wrote: > > > Am 05.03.24 um 07:40 schrieb Lijo Lazar: >> For VCN 4.0.3, use only the local addressing scheme while in VF >> mode. This includes addressing scheme used for HUB offsets. >> >> Signed-off-by: Lijo Lazar >> --- >>  

Re: [PATCH 1/2] drm/amdgpu: Add HDP flush to VCN v4.0.3

2024-03-05 Thread Lazar, Lijo
On 3/5/2024 2:22 PM, Christian König wrote: > Am 05.03.24 um 07:40 schrieb Lijo Lazar: >> VCN 4.0.3 cannot trigger HDP flush with RRMT enabled. Instead, trigger >> HDP flush from host side before ringing doorbell. > > Well that won't work like that. > > The HDP flush is supposed to be emitted

Re: [PATCH] drm/amdgpu: Fix potential Spectre vulnerability in amdgpu_gfx_parse_disable_cu()

2024-03-01 Thread Lazar, Lijo
On 3/1/2024 7:52 PM, Christian König wrote: > Am 01.03.24 um 15:01 schrieb Lazar, Lijo: >> On 3/1/2024 6:15 PM, Srinivasan Shanmugam wrote: >>> The 'mask' array could be used in a way that would make the code >>> vulnerable to a Spectre attack. The issue is

Re: [PATCH] drm/amdgpu: Fix potential Spectre vulnerability in amdgpu_gfx_parse_disable_cu()

2024-03-01 Thread Lazar, Lijo
On 3/1/2024 6:15 PM, Srinivasan Shanmugam wrote: > The 'mask' array could be used in a way that would make the code > vulnerable to a Spectre attack. The issue is likely related to the fact > that the 'mask' array is being indexed using values that are derived > from user input (the 'se' and

Re: [PATCH] drm/amdgpu/pm: Fix the error of pwm1_enable setting

2024-03-01 Thread Lazar, Lijo
On 3/1/2024 1:15 PM, Ma Jun wrote: > Fix the pwm_mode value error which used for > pwm1_enable setting > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git

Re: [PATCH v2] drm/amgpu: Check return value of amdgpu_device_baco_enter/exit

2024-02-29 Thread Lazar, Lijo
On 2/29/2024 4:40 PM, Ma, Jun wrote: > Hi Lijo, > > On 2/29/2024 3:33 PM, Lazar, Lijo wrote: >> >> >> On 2/29/2024 11:49 AM, Ma Jun wrote: >>> Check return value of amdgpu_device_baco_enter/exit and print >>> warning message because these errors ma

Re: [PATCH v2] drm/amgpu: Check return value of amdgpu_device_baco_enter/exit

2024-02-28 Thread Lazar, Lijo
On 2/29/2024 11:49 AM, Ma Jun wrote: > Check return value of amdgpu_device_baco_enter/exit and print > warning message because these errors may cause runtime resume failure > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 -- > 1 file

Re: [PATCH] drm/amdgpu: Removed used parameter

2024-02-28 Thread Lazar, Lijo
On 2/27/2024 9:23 PM, Harish Kasiviswanathan wrote: > Also passing adev is misleading if BO is associated with different adev. > In this case BO is mapped to a different device > Looks like a typo in subject - unused? Thanks, Lijo > Signed-off-by: Harish Kasiviswanathan > --- >

Re: [PATCH 1/2] drm/amdgpu: Use rpm_mode flag instead of checking it again for rpm

2024-02-28 Thread Lazar, Lijo
On 2/28/2024 5:14 PM, Ma Jun wrote: > Because the rpm_mode flag is already set when the driver > is initialized, we use it directly for runtime suspend/resume > instead of checking it again > > Signed-off-by: Ma Jun Reviewed-by: Lijo Lazar Thanks, Lijo > --- >

Re: [PATCH 2/2] drm/amgpu: Check return value of amdgpu_device_baco_enter/exit

2024-02-28 Thread Lazar, Lijo
On 2/28/2024 5:14 PM, Ma Jun wrote: > Check return value of amdgpu_device_baco_enter/exit and print > warning message because these errors may cause runtime resume failure > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 ++-- > 1 file changed, 6

Re: [PATCH] drm/amdgpu: Fix multiple truncation issues in multiple driver files

2024-02-28 Thread Lazar, Lijo
On 2/27/2024 10:05 PM, Srinivasan Shanmugam wrote: > Fixes snprintf function by writing more bytes into various buffers than > they can hold. > > In several files - smu_v13_0.c, gfx_v11_0.c, gfx_v10_0.c, gfx_v9_0.c, > and amdgpu_mes.c. They were related to different directives, such as > '%s',

  1   2   3   4   5   6   7   8   9   10   >