Re: [PATCH] drm/amdgpu: Move SR-IOV check into amdgpu_gfx_sysfs_compute_init

2024-06-07 Thread Lazar, Lijo
On 6/7/2024 12:31 PM, SRINIVASAN SHANMUGAM wrote: > > On 6/6/2024 10:58 PM, Lazar, Lijo wrote: >> On 6/6/2024 5:35 PM, Srinivasan Shanmugam wrote: >>> Previously, this check was performed in the gfx_v9_4_3_sw_init function, >>> and the amdgpu_gfx_sysfs_compute

Re: [PATCH] drm/amdgpu: fix NULL pointer in amdgpu_reset_get_desc

2024-06-06 Thread Lazar, Lijo
On 6/6/2024 9:13 PM, Eric Huang wrote: > amdgpu_job_ring may return NULL, which causes kernel NULL > pointer error, using another way to print ring name instead > of ring->name. > > Suggested-by: Lijo Lazar > Signed-off-by: Eric Huang Reviewed-by: Lijo Lazar Thanks, Lijo > --- >

Re: [PATCH] drm/amdgpu: Move SR-IOV check into amdgpu_gfx_sysfs_compute_init

2024-06-06 Thread Lazar, Lijo
On 6/6/2024 5:35 PM, Srinivasan Shanmugam wrote: > Previously, this check was performed in the gfx_v9_4_3_sw_init function, > and the amdgpu_gfx_sysfs_compute_init function was only called if the > GPU was not a VF in SR-IOV mode. This is because the sysfs entries > created by

Re: [PATCH 2/2] drm/amdkfd: add reset cause in gpu pre-reset smi event

2024-06-04 Thread Lazar, Lijo
On 6/3/2024 11:42 PM, Eric Huang wrote: > reset cause is requested by customer as additional > info for gpu reset smi event. > > v2: integerate reset sources suggested by Lijo Lazar > > Signed-off-by: Eric Huang This series is Reviewed-by: Lijo Lazar I think SMI needs to get all

RE: [PATCH] drm/amdkfd: add reset cause in gpu pre-reset smi event

2024-06-03 Thread Lazar, Lijo
[AMD Official Use Only - AMD Internal Distribution Only] Hi Eric, To consider other reset cases also, you may have something like attached. Thanks, Lijo -Original Message- From: amd-gfx On Behalf Of Eric Huang Sent: Friday, May 31, 2024 8:38 PM To: amd-gfx@lists.freedesktop.org Cc:

Re: [PATCH] drm/amdkfd: select CONFIG_CRC16

2024-05-28 Thread Lazar, Lijo
On 5/28/2024 5:20 PM, Arnd Bergmann wrote: > From: Arnd Bergmann > > The amdkfd support fails to link when CONFIG_CRC16 is disabled: > > aarch64-linux-ld: drivers/gpu/drm/amd/amdkfd/kfd_topology.o: in function > `kfd_topology_add_device': > kfd_topology.c:(.text+0x3a4c): undefined reference

Re: [PATCH] drm/amd/amdgpu: fix the inst passed to amdgpu_virt_rlcg_reg_rw

2024-05-22 Thread Lazar, Lijo
anks, Lijo > Thanks, > Victor > > -Original Message- > From: Lazar, Lijo > Sent: Wednesday, May 22, 2024 2:14 PM > To: Zhao, Victor ; amd-gfx@lists.freedesktop.org > Subject: RE: [PATCH] drm/amd/amdgpu: fix the inst passed to > amdgpu_virt_rlcg_reg_rw > > [AMD Off

RE: [PATCH] drm/amd/amdgpu: fix the inst passed to amdgpu_virt_rlcg_reg_rw

2024-05-22 Thread Lazar, Lijo
[AMD Official Use Only - AMD Internal Distribution Only] Hi Victor, Could you check if an approach like the attached one helps? Thanks, Lijo -Original Message- From: Zhao, Victor Sent: Wednesday, May 22, 2024 11:13 AM To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; Lazar, Lijo

Re: [PATCH 1/4 V2] drm/amdgpu: fix invadate operation for umsch

2024-05-21 Thread Lazar, Lijo
On 5/22/2024 7:49 AM, Zhang, Jesse(Jie) wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > Hi Lijo > > -Original Message- > From: Lazar, Lijo > Sent: Tuesday, May 21, 2024 4:20 PM > To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org

Re: [PATCH 1/4 V2] drm/amdgpu: fix invadate operation for umsch

2024-05-21 Thread Lazar, Lijo
On 5/21/2024 12:46 PM, Jesse Zhang wrote: > Since the type of data_size is uint32_t, adev->umsch_mm.data_size - 1 >> 16 > >> 16 is 0 > regardless of the values of its operands > > So removing the operations upper_32_bits and lower_32_bits. > > Signed-off-by: Jesse Zhang > Suggested-by: Tim

Re: [PATCH v2] drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring

2024-05-21 Thread Lazar, Lijo
On 5/21/2024 1:07 PM, Srinivasan Shanmugam wrote: > This commit fixes a format truncation issue arosed by the snprintf > function potentially writing more characters into the ring->name buffer > than it can hold, in the amdgpu_gfx_kiq_init_ring function > > The issue occurred because the

Re: [PATCH] drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring

2024-05-20 Thread Lazar, Lijo
On 5/21/2024 10:13 AM, Srinivasan Shanmugam wrote: > This commit fixes a format truncation issue arosed by the snprintf > function potentially writing more characters into the ring->name buffer > than it can hold, in the amdgpu_gfx_kiq_init_ring function > > The issue occurred because the '%d'

Re: [PATCH] drm/amd/amdgpu: fix the inst passed to reg read write under sriov

2024-05-20 Thread Lazar, Lijo
On 5/20/2024 4:44 PM, Victor Zhao wrote: > the inst passed to reg read/write should be physical instance. > Fix the miss matched code. > > Signed-off-by: Victor Zhao > --- > .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 6 ++--- > .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 2 +- >

Re: [PATCH 1/2] Revert "drm/amd/pm: Use gpu_metrics_v1_6 for SMUv13.0.6"

2024-05-19 Thread Lazar, Lijo
On 5/20/2024 10:31 AM, Asad Kamal wrote: > Remove gpu_metrics_v1_6 usage for SMUv13.0.6 temporarily and use > gpu_metrics_v1_5 until tool support is ready for it. > > This reverts commit e6efb71ae640eada28f44cc97aa79e8ae4901e63. > > Signed-off-by: Asad Kamal Series is Reviewed-by:

Re: [PATCH v4 00/10] Add PM policy interfaces

2024-05-15 Thread Lazar, Lijo
On 5/14/2024 4:35 PM, Lijo Lazar wrote: > This series adds APIs to get the supported PM policies and also set them. A PM > policy type is a predefined policy type supported by an SOC and each policy > may > define two or more levels to choose from. A user can select the appropriate > level

Re: [PATCH 2/2 v2] drm/amd/pm: check specific index for aldebaran

2024-05-14 Thread Lazar, Lijo
On 5/14/2024 12:28 PM, Jesse Zhang wrote: > To avoid warning problems, drop index and > use PPSMC_MSG_GfxDriverReset instead of index for aldebaran. > > Signed-off-by: Jesse Zhang > Suggested-by: Lijo Lazar > --- > drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 13 +++-- > 1

Re: [PATCH 2/2] drm/amd/pm: check specific index for aldebaran

2024-05-14 Thread Lazar, Lijo
On 5/14/2024 12:37 PM, Wang, Yang(Kevin) wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > -Original Message- > From: amd-gfx On Behalf Of Lazar, Lijo > Sent: Tuesday, May 14, 2024 2:07 PM > To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop

Re: [PATCH] drm/amdgpu/pm: Drop hard-code value of usTMax

2024-05-14 Thread Lazar, Lijo
On 5/14/2024 9:43 AM, Ma Jun wrote: > Drop hard-code value of nsTmax because we read this > value from fantable below. > > Signed-off-by: Ma Jun Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/pm/powerplay/hwmgr/process_pptables_v1_0.c | 2 -- > 1 file changed, 2

Re: [PATCH v2] drm/amdgpu: Fix the null pointer dereference to ras_manager

2024-05-14 Thread Lazar, Lijo
On 5/14/2024 9:42 AM, Ma Jun wrote: > Check ras_manager before using it > > Signed-off-by: Ma Jun Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git

Re: [PATCH 2/2] drm/amd/pm: check specific index for aldebaran

2024-05-14 Thread Lazar, Lijo
On 5/14/2024 11:34 AM, Jesse Zhang wrote: > To avoid warning problems, drop index and > use PPSMC_MSG_GfxDriverReset instead of index for aldebaran. > > Signed-off-by: Jesse Zhang > Suggested-by: Lijo Lazar > --- > drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 13 +++-- > 1

Re: [PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle

2024-05-13 Thread Lazar, Lijo
On 5/14/2024 6:30 AM, Ma, Jun wrote: > Hi Lijo & Kevin, thanks for review, will drop this patch > In the original function below check is there. if (!handle || !info || type >= ACA_ERROR_TYPE_COUNT) return -EINVAL; So moving this to a later stage is still valid.

Re: [PATCH 05/22] drm/amd/pm: check specific index for aldebaran

2024-05-13 Thread Lazar, Lijo
On 5/10/2024 8:20 AM, Jesse Zhang wrote: > Check for specific indexes that may be invalid values. > > Signed-off-by: Jesse Zhang > --- > drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git

Re: [PATCH 09/22] drm/amd/pm: check specific index for smu13

2024-05-13 Thread Lazar, Lijo
On 5/13/2024 4:27 PM, Lazar, Lijo wrote: > > > On 5/10/2024 8:20 AM, Jesse Zhang wrote: >> Check for specific indexes that may be invalid values. >> >> Signed-off-by: Jesse Zhang >> --- >> drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c

Re: [PATCH 09/22] drm/amd/pm: check specific index for smu13

2024-05-13 Thread Lazar, Lijo
On 5/10/2024 8:20 AM, Jesse Zhang wrote: > Check for specific indexes that may be invalid values. > > Signed-off-by: Jesse Zhang > --- > drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git

Re: [PATCH 3/5] drm/amdgpu: Fix null pointer dereference to aca_handle

2024-05-13 Thread Lazar, Lijo
On 5/13/2024 2:26 PM, Ma Jun wrote: > Check handle pointer before using it > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c >

Re: [PATCH 2/5] drm/amdgpu: Fix the null pointer dereference to ras_manager

2024-05-13 Thread Lazar, Lijo
On 5/13/2024 2:26 PM, Ma Jun wrote: > Check ras_manager before using it > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c >

Re: [PATCH v3] drm/amdgpu: Add Ring Hang Events

2024-05-13 Thread Lazar, Lijo
On 5/13/2024 9:44 AM, Ori Messinger wrote: > This patch adds 'ring hang' events to the driver. > This is done by adding a 'reset_ring_hang' bool variable to the > struct 'amdgpu_reset_context' in the amdgpu_reset.h file. > The purpose for this 'reset_ring_hang' variable is whenever a GPU >

Re: [PATCH] drm/amdkfd: Ensure gpu_id is unique

2024-05-10 Thread Lazar, Lijo
On 5/10/2024 1:36 AM, Harish Kasiviswanathan wrote: > gpu_id needs to be unique for user space to identify GPUs via KFD > interface. In the current implementation there is a very small > probability of having non unique gpu_ids. > > v2: Add check to confirm if gpu_id is unique. If not unique,

Re: [PATCH 19/22 V2] drm/amdgpu: Fix the warning division or modulo by zero for the variable num_xcc_per_xcp

2024-05-10 Thread Lazar, Lijo
On 5/10/2024 1:56 PM, Jesse Zhang wrote: > Checks the partition mode and returns an error for an invalid mode. > > Signed-off-by: Jesse Zhang > Suggested-by: Lijo Lazar > --- > drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git

Re: [PATCH 19/22] drm/amdgpu: Fix the warning division or modulo by zero for the variable num_xcc_per_xcp

2024-05-10 Thread Lazar, Lijo
On 5/10/2024 1:09 PM, Zhang, Jesse(Jie) wrote: > [AMD Official Use Only - General] > > Hi Lijo, > > -Original Message- > From: amd-gfx On Behalf Of Lazar, Lijo > Sent: Friday, May 10, 2024 3:16 PM > To: amd-gfx@lists.freedesktop.org > Subject: Re: [P

Re: [PATCH 19/22] drm/amdgpu: Fix the warning division or modulo by zero for the variable num_xcc_per_xcp

2024-05-10 Thread Lazar, Lijo
On 5/10/2024 8:20 AM, Jesse Zhang wrote: > Dividing expression num_xcc_per_xcp which may be zero has undefined behavior. > > Signed-off-by: Jesse Zhang > --- > drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git

Re: [PATCH] drm/amdgpu: Fix truncation by resizing ucode_prefix in imu_v12_0_init_microcode

2024-05-08 Thread Lazar, Lijo
On 5/7/2024 10:14 PM, Srinivasan Shanmugam wrote: > This commit fixes potential truncation when writing the string _imu.bin > into the fw_name buffer in the imu_v12_0_init_microcode function in the > imu_v12_0.c file > > The ucode_prefix size was reduced from 30 to 15 to ensure the snprintf >

Re: [PATCH] drm/amdgpu: Fix buffer size to prevent truncation in gfx_v12_0_init_microcode

2024-05-08 Thread Lazar, Lijo
On 5/7/2024 10:14 PM, Srinivasan Shanmugam wrote: > This commit addresses multiple warnings in the gfx_v12_0_init_microcode > function in the gfx_v12_0.c file. The warnings were related to potential > truncation when writing the strings _pfp.bin, _me.bin, _rlc.bin, and > _mec.bin into the

Re: [PATCH] drm/amdkfd: Ensure gpu_id is unique

2024-05-06 Thread Lazar, Lijo
On 5/4/2024 3:36 AM, Harish Kasiviswanathan wrote: > gpu_id needs to be unique for user space to identify GPUs via KFD > interface. In the current implementation there is a very small > probability of having non unique gpu_ids. > > v2: Add check to confirm if gpu_id is unique. If not unique,

Re: [PATCH] drm/amdgpu: Assign correct bits for SDMA HDP flush

2024-05-06 Thread Lazar, Lijo
On 5/7/2024 6:00 AM, Harry Wentland wrote: > This patch is causing crashes of Manor Lords on my Navi 21 on the 6.8.9 > stable kernel. It leads to an assertion failure in wine: > > File: ../src-wine/dlls/winevulkan/loader_thunks.c > Line: 3621 > > Expression "!status && vkEndCommandBuffer"" >

Re: [PATCH] drm/amd/amdgpu: Check tbo resource pointer

2024-05-02 Thread Lazar, Lijo
On 5/2/2024 7:01 PM, Asad Kamal wrote: > Validate tbo resource pointer, skip if NULL > > Signed-off-by: Asad Kamal > Reviewed-by: Christian König Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++- > 1 file changed, 2 insertions(+), 1

Re: [PATCH v3 1/4] drm/amdgpu: Fix two reset triggered in a row

2024-04-30 Thread Lazar, Lijo
On 4/26/2024 9:27 AM, Yunxiang Li wrote: > Some times a hang GPU causes multiple reset sources to schedule resets. > The second source will be able to trigger an unnecessary reset if they > schedule after we call amdgpu_device_stop_pending_resets. > > Move amdgpu_device_stop_pending_resets to

Re: [PATCH] drm/amdgpu: avoid reading vf2pf info size from FB

2024-04-30 Thread Lazar, Lijo
On 4/30/2024 7:53 PM, Zhigang Luo wrote: > VF can't access FB when host is doing mode1 reset. Using sizeof to get > vf2pf info size, instead of reading it from vf2pf header stored in FB. > > Signed-off-by: Zhigang Luo > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +- > 1 file changed,

Re: [PATCH] drm/amdgpu: add mutex to protect ras shared memory

2024-04-29 Thread Lazar, Lijo
On 4/28/2024 12:38 PM, YiPeng Chai wrote: > Add mutex to protect ras shared memory. > > Signed-off-by: YiPeng Chai > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c| 121 ++--- > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h| 1 + >

Re: [PATCH] drm/amdgpu: skip ip dump if devcoredump flag is set

2024-04-25 Thread Lazar, Lijo
On 4/25/2024 3:53 PM, Sunil Khatri wrote: > Do not dump the ip registers during driver reload > in passthrough environment. > > Signed-off-by: Sunil Khatri > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git

Re: [PATCH v2 2/2] drm/amdgpu: Fix the uninitialized variable warning

2024-04-25 Thread Lazar, Lijo
On 4/25/2024 3:30 PM, Ma Jun wrote: > Initialize the phy_id to 0 to fix the warning of > "Using uninitialized value phy_id" > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_securedisplay.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git

Re: [PATCH v2] drm/amdgpu: Fix buffer size in gfx_v9_4_3_init_ cp_compute_microcode() and rlc_microcode()

2024-04-25 Thread Lazar, Lijo
On 4/25/2024 12:05 PM, Srinivasan Shanmugam wrote: > The function gfx_v9_4_3_init_microcode in gfx_v9_4_3.c was generating > about potential truncation of output when using the snprintf function. > The issue was due to the size of the buffer 'ucode_prefix' being too > small to accommodate the

Re: [PATCH] drm/amdgpu: Fix buffer size in gfx_v9_4_3_init_ cp_compute_microcode() and rlc_microcode()

2024-04-24 Thread Lazar, Lijo
On 4/23/2024 7:13 AM, Srinivasan Shanmugam wrote: > The buffer size is determined by the declaration char fw_name[30]; This > means fw_name can hold up to 30 characters, including the null character > that marks the end of the string. > > The string to be written is "amdgpu/%s_mec.bin" or

Re: [PATCH] drm/amdgpu: Fix two reset triggered in a row

2024-04-22 Thread Lazar, Lijo
On 4/23/2024 1:15 AM, Yunxiang Li wrote: > Reset request from KFD is missing a check for if a reset is already in > progress, this causes a second reset to be triggered right after the > previous one finishes. Add the check to align with the other reset sources. > > Signed-off-by: Yunxiang Li

Re: [PATCH v4] drm/amdgpu: Fix snprintf buffer size in smu_v14_0_init_microcode

2024-04-22 Thread Lazar, Lijo
On 4/19/2024 9:14 PM, Srinivasan Shanmugam wrote: > This commit addresses buffer overflow in the smu_v14_0_init_microcode > function. The issue was about the snprintf function writing more bytes > into the fw_name buffer than it can hold. > > The line of code is: > > snprintf(fw_name,

Re: [PATCH] drm/amdgpu: fix use-after-free issue

2024-04-22 Thread Lazar, Lijo
On 4/22/2024 4:52 PM, Christian König wrote: > Am 22.04.24 um 11:37 schrieb Lazar, Lijo: >> >> On 4/22/2024 2:59 PM, Christian König wrote: >>> Am 22.04.24 um 10:47 schrieb Jack Xiao: >>>> Delete fence fallback timer to fix the ramdom >>>&g

Re: [PATCH v2] drm/amdgpu/mes: fix use-after-free issue

2024-04-22 Thread Lazar, Lijo
On 4/22/2024 3:09 PM, Jack Xiao wrote: > Delete fence fallback timer to fix the ramdom > use-after-free issue. > > v2: move to amdgpu_mes.c > > Signed-off-by: Jack Xiao Acked-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 1 + > 1 file changed, 1

Re: [PATCH] drm/amdgpu: fix use-after-free issue

2024-04-22 Thread Lazar, Lijo
On 4/22/2024 2:59 PM, Christian König wrote: > Am 22.04.24 um 10:47 schrieb Jack Xiao: >> Delete fence fallback timer to fix the ramdom >> use-after-free issue. > > That's already done in amdgpu_fence_driver_hw_fini() and absolutely > shouldn't be in amdgpu_ring_fini(). > > And the

Re: [PATCH] drm/amdgpu: init microcode chip name from ip versions

2024-04-22 Thread Lazar, Lijo
On 4/22/2024 11:23 AM, Le Ma wrote: > To adapt to different gc versions in gfx_v9_4_3.c file. > > Change-Id: Ib4465aade0dcbbcc43318c6dc865f813c5411097 > Signed-off-by: Le Ma > Reviewed-by: Hawking Zhang Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c

Re: [PATCH] drm/amdgpu: Update CGCG settings for GFXIP 9.4.3

2024-04-21 Thread Lazar, Lijo
On 4/22/2024 6:42 AM, Rajneesh Bhardwaj wrote: > Tune coarse grain clock gating idle threshold and rlc idle timeout to > achieve better kernel launch latency. > > Signed-off-by: Rajneesh Bhardwaj Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 8

RE: [PATCH v5 2/6] drm/amdgpu: add support of gfx10 register dump

2024-04-17 Thread Lazar, Lijo
). If that happens, then the purpose of the patch - to get the context of a device hang - is lost. We may not even get a proper dmesg log. Thanks, Lijo -Original Message- From: Khatri, Sunil Sent: Wednesday, April 17, 2024 9:42 PM To: Lazar, Lijo ; Alex Deucher ; Khatri, Sunil Cc

Re: [PATCH v5 2/6] drm/amdgpu: add support of gfx10 register dump

2024-04-17 Thread Lazar, Lijo
On 4/17/2024 9:21 PM, Alex Deucher wrote: > On Wed, Apr 17, 2024 at 5:38 AM Sunil Khatri wrote: >> >> Adding gfx10 gc registers to be used for register >> dump via devcoredump during a gpu reset. >> >> Signed-off-by: Sunil Khatri > > Reviewed-by: Alex Deucher > >> --- >>

Re: [PATCH] drm/amdgpu/pm: Remove gpu_od if it's an empty directory

2024-04-17 Thread Lazar, Lijo
On 4/17/2024 11:23 AM, Ma Jun wrote: > gpu_od should be removed if it's an empty directory > > Signed-off-by: Ma Jun > Reported-by: Yang Wang > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c >

Re: [PATCH] drm/amdgpu/pm: Print od status info

2024-04-17 Thread Lazar, Lijo
On 4/17/2024 3:10 PM, Ma Jun wrote: > Print the od status info if it's not supported. > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c > b/drivers/gpu/drm/amd/pm/amdgpu_pm.c >

Re: [PATCH v2] drm/amdgpu: Skip the coredump collection on reset during driver reload

2024-04-17 Thread Lazar, Lijo
On 4/17/2024 1:14 PM, Khatri, Sunil wrote: > > On 4/17/2024 1:06 PM, Khatri, Sunil wrote: >> devcoredump is used to debug gpu hangs/resets. So in normal process >> when there is a hang due to ring timeout or page fault we are doing a >> hard reset as soft reset fail in those cases. How are we

Re: [PATCH v2] drm/amdgpu: Skip the coredump collection on reset during driver reload

2024-04-16 Thread Lazar, Lijo
On 4/17/2024 9:43 AM, Ahmad Rehman wrote: > In passthrough environment, the driver triggers the mode-1 reset on > reload. The reset causes the core dump collection which is delayed task > and prevents driver from unloading until it is completed. Since we do > not need to collect data on "reset

Re: [PATCH] drm/amdgpu: Skip the coredump collection on reset during driver reload

2024-04-16 Thread Lazar, Lijo
On 4/17/2024 12:05 AM, Ahmad Rehman wrote: > In passthrough environment, the driver triggers the mode-1 reset on > reload. The reset causes the core dump collection which is delayed task > and prevents driver from unloading until it is completed. Since we do > not need to collect data on "reset

RE: [PATCH] drm/amdgpu: Use driver mode reset for data poison

2024-04-16 Thread Lazar, Lijo
[Public] Is this applicable for aldebaran also? Thanks, Lijo -Original Message- From: amd-gfx On Behalf Of Hawking Zhang Sent: Tuesday, April 16, 2024 11:46 AM To: amd-gfx@lists.freedesktop.org; Zhou1, Tao Cc: Zhang, Hawking Subject: [PATCH] drm/amdgpu: Use driver mode reset for data

Re: [PATCH v2] drm/amdgpu: refactoring the runtime pm mode detection code

2024-04-15 Thread Lazar, Lijo
On 4/3/2024 8:27 AM, Ma Jun wrote: > refactor the code of runtime pm mode detection to support > amdgpu_runtime_pm =2 and 1 two cases > > Signed-off-by: Ma Jun > Reviewed-by: Yang Wang Reviewed-by: Lijo Lazar Thanks, Lijo > --- > v1->v2: > - Fix logic and output info (Lijo) > - Fix code

Re: [PATCH] drm/amdgpu/gfx11: properly handle regGRBM_GFX_CNTL in soft reset

2024-04-11 Thread Lazar, Lijo
On 4/8/2024 10:50 PM, Alex Deucher wrote: > Need to take the srbm_mutex and while we are here, use the > helper function soc21_grbm_select(); > > Signed-off-by: Alex Deucher Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 15 +-- > 1 file

Re: [PATCH 2/2] amd/amdgpu: improve VF recover time

2024-04-04 Thread Lazar, Lijo
On 4/3/2024 11:42 PM, Zhigang Luo wrote: > 1. change AMDGPU_VF2PF_UPDATE_MAX_RETRY_LIMIT from 30 to 5. > 2. set fatel error detected flag. > > Signed-off-by: Zhigang Luo Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + >

Re: [PATCH 1/2] drm/amd/pm: Report uclk and sclk limit

2024-04-02 Thread Lazar, Lijo
On 4/2/2024 4:00 PM, Lazar, Lijo wrote: > > > On 4/2/2024 3:52 PM, Asad Kamal wrote: >> Report max set uclk and sclk for smu_v_13_0_6 >> > > You may rephrase as > > "Use OD (pp_od_clk_voltage) interface to report current limits, default >

Re: [PATCH 2/2] drm/amd/pm: Update uclk/sclk limit report format

2024-04-02 Thread Lazar, Lijo
On 4/2/2024 3:52 PM, Asad Kamal wrote: > Update max set uclk and sclk reporting format for smu_v_13_0_0 > Use aldebaran instead of smu v13.0.0 - both are different. You may also add the description similar to patch 1. With those updates, Reviewed-by: Lijo Lazar Thanks, Lijo >

Re: [PATCH 1/2] drm/amd/pm: Report uclk and sclk limit

2024-04-02 Thread Lazar, Lijo
On 4/2/2024 3:52 PM, Asad Kamal wrote: > Report max set uclk and sclk for smu_v_13_0_6 > You may rephrase as "Use OD (pp_od_clk_voltage) interface to report current limits, default or those set by user, for SCLK and UCLK." Thanks, Lijo > Signed-off-by: Asad Kamal > --- >

Re: [PATCH] drm/amdgpu: refactoring the runtime pm mode detection code

2024-04-01 Thread Lazar, Lijo
On 3/29/2024 1:58 PM, Ma Jun wrote: > refactor the code of runtime pm mode detection to support > amdgpu_runtime_pm =2 and 1 two cases > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 68 ++

Re: [PATCH v2 2/4] drm/amd/pm: Add PMFW message and capability flags

2024-04-01 Thread Lazar, Lijo
On 4/1/2024 4:45 PM, Kamal, Asad wrote: > [AMD Official Use Only - General] > > -Original Message- > From: amd-gfx On Behalf Of Lijo Lazar > Sent: Thursday, March 28, 2024 8:06 AM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Deucher, Alexander > ; Wang, Yang(Kevin) >

Re: [PATCH v3 1/5] drm/amdgpu: Add a new runtime mode definition

2024-03-28 Thread Lazar, Lijo
On 3/27/2024 4:40 PM, Ma Jun wrote: > Add a new runtime pm mode AMDGPU_RUNPM_BAMACO > and related macro definition > > Signed-off-by: Ma Jun Series is Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 4 > 1 file changed, 4 insertions(+) > >

Re: [PATCH] drm/amdgpu: Reset dGPU if suspend got aborted

2024-03-27 Thread Lazar, Lijo
On 3/28/2024 8:49 AM, Wang, Yang(Kevin) wrote: > [AMD Official Use Only - General] > > -Original Message- > From: amd-gfx On Behalf Of Lijo Lazar > Sent: Thursday, March 28, 2024 11:06 AM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Deucher, Alexander > > Subject:

Re: [PATCH v2 2/4] drm/amd/pm: Add PMFW message and capability flags

2024-03-27 Thread Lazar, Lijo
On 3/28/2024 8:57 AM, Wang, Yang(Kevin) wrote: > [AMD Official Use Only - General] > > -Original Message- > From: amd-gfx On Behalf Of Lijo Lazar > Sent: Thursday, March 28, 2024 10:36 AM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Deucher, Alexander > ; Wang,

Re: [PATCH] drm/amdgpu: make amdgpu device attr_update() function more efficient

2024-03-27 Thread Lazar, Lijo
On 3/26/2024 2:32 PM, Yang Wang wrote: > add a new enumeration type to identify device attribute node, > this method is relatively more efficient compared with 'strcmp' in > update_attr() function. > > Signed-off-by: Yang Wang > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 4 +-- >

Re: [PATCH 1/3] amd/amdkfd: add a function to wait no process running in kfd

2024-03-26 Thread Lazar, Lijo
On 3/23/2024 1:27 AM, Zhigang Luo wrote: > Signed-off-by: Zhigang Luo > Change-Id: I2a98d513c26107ac76ecf20e951c188afbc7ede6 > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 20 > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 5 - >

Re: [PATCH 4/5] drm/amdgpu: Add support for BAMACO mode checking

2024-03-26 Thread Lazar, Lijo
On 3/26/2024 2:59 PM, Lazar, Lijo wrote: > > > On 3/25/2024 3:45 PM, Ma Jun wrote: >> Optimize the code to add support for BAMACO mode checking >> >> Signed-off-by: Ma Jun >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +- >>

Re: [PATCH 4/5] drm/amdgpu: Add support for BAMACO mode checking

2024-03-26 Thread Lazar, Lijo
On 3/25/2024 3:45 PM, Ma Jun wrote: > Optimize the code to add support for BAMACO mode checking > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 74 +++-- >

Re: [PATCH 3/5] drm/amdgpu/pm: Add support for MACO flag checking

2024-03-26 Thread Lazar, Lijo
On 3/25/2024 3:45 PM, Ma Jun wrote: > Add support for MACO flag checking. > MACO mode only works if BACO is supported. > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h| 4 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- >

Re: [PATCH] drm/amdgpu: Fix truncation in smu_v11_0_init_microcode

2024-03-22 Thread Lazar, Lijo
On 3/22/2024 12:33 PM, Srinivasan Shanmugam wrote: > Reducing the size of ucode_prefix to 25 in the smu_v11_0_init_microcode > function. we ensure that fw_name can accommodate the maximum possible > string size > > Fixes the below with gcc W=1: >

Re: [PATCH] drm/amdgpu: Fix truncation in gfx_v10_0_init_microcode

2024-03-22 Thread Lazar, Lijo
On 3/22/2024 12:24 PM, Srinivasan Shanmugam wrote: > The total size of the fw_name buffer is 8 (for "amdgpu/") + 30 (for > ucode_prefix) + 5 (for "_pfp") + 5 (for "_wks") + 5 (for ".bin") = 53 > characters. > > Fixes the below with gcc W=1: > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c: In function

Re: [PATCH] drm/amdgpu: Fix truncations in gfx_v11_0_init_microcode()

2024-03-22 Thread Lazar, Lijo
On 3/22/2024 12:02 PM, Srinivasan Shanmugam wrote: > Reducing the size of ucode_prefix to 25 in the gfx_v11_0_init_microcode > function. This would ensure that the total number of characters being > written into fw_name does not exceed its size of 40. > > Fixes the below with gcc W=1: >

Re: [PATCH] drm/amdgpu: Fix truncation issues in gfx_v9_0.c

2024-03-22 Thread Lazar, Lijo
On 3/22/2024 11:54 AM, Srinivasan Shanmugam wrote: > The size of fw_name is increased to ensure that it can accommodate > the maximum possible size of the string being written into it. > > Fixes the below with gcc W=1: > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c: In function ‘gfx_v9_0_early_init’:

Re: [PATCH] drm/amdgpu: Fix 'fw_name' buffer size to prevent truncations in amdgpu_mes_init_microcode

2024-03-21 Thread Lazar, Lijo
On 3/21/2024 11:16 AM, Srinivasan Shanmugam wrote: > The snprintf function is used to write a formatted string into fw_name. > The format of the string is "amdgpu/%s_mes%s.bin", where %s is replaced > by the string in ucode_prefix and the second %s is replaced by either > "_2" or "1" depending

Re: [PATCH v2] drm/amdgpu: Fix format character cut-off issues in amdgpu_vcn_early_init()

2024-03-21 Thread Lazar, Lijo
On 3/21/2024 10:29 AM, Srinivasan Shanmugam wrote: > Reducing the size of ucode_prefix to 25 in the amdgpu_vcn_early_init > function. This would ensure that the total number of characters being > written into fw_name does not exceed its size of 40. > > Fixes the below with gcc W=1: >

Re: [PATCH] drm/amdgpu: Fix the runtime pm mode error

2024-03-21 Thread Lazar, Lijo
On 3/21/2024 12:28 PM, Ma, Jun wrote: > > > On 3/20/2024 9:38 PM, Lazar, Lijo wrote: >> >> >> On 3/20/2024 6:54 PM, Alex Deucher wrote: >>> On Wed, Mar 20, 2024 at 6:17 AM Ma Jun wrote: >>>> >>>> Because of the logic error,

Re: [PATCH] drm/amdgpu: Fix format character cut-off issues in amdgpu_vcn_early_init()

2024-03-20 Thread Lazar, Lijo
On 3/20/2024 8:28 PM, SRINIVASAN SHANMUGAM wrote: > > On 3/20/2024 3:12 PM, Lazar, Lijo wrote: >> >> On 3/20/2024 2:15 PM, Srinivasan Shanmugam wrote: >>> The issue was present in the lines where 'fw_name' was being formatted. >>> This fix ensures that the o

Re: [PATCH] drm/amdgpu: Fix the runtime pm mode error

2024-03-20 Thread Lazar, Lijo
On 3/20/2024 6:54 PM, Alex Deucher wrote: > On Wed, Mar 20, 2024 at 6:17 AM Ma Jun wrote: >> >> Because of the logic error, Arcturus and vega20 currently >> use the AMDGPU_RUNPM_NONE for runtime pm even though they >> support BACO. So, the code is optimized to fix this error. >> >>

Re: [PATCH] drm/amdgpu: Fix format character cut-off issues in amdgpu_vcn_early_init()

2024-03-20 Thread Lazar, Lijo
On 3/20/2024 2:15 PM, Srinivasan Shanmugam wrote: > The issue was present in the lines where 'fw_name' was being formatted. > This fix ensures that the output is not truncated > > Fixes the below with gcc W=1: > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c: In function ‘amdgpu_vcn_early_init’: >

Re: [PATCH] drm/amdgpu: refactor code to reuse system information

2024-03-19 Thread Lazar, Lijo
On 3/19/2024 7:27 PM, Khatri, Sunil wrote: > > On 3/19/2024 7:19 PM, Lazar, Lijo wrote: >> >> On 3/19/2024 6:02 PM, Sunil Khatri wrote: >>> Refactor the code so debugfs and devcoredump can reuse >>> the common information and avoid unnecessary copy of it. &g

Re: [PATCH] drm/amdgpu: refactor code to reuse system information

2024-03-19 Thread Lazar, Lijo
On 3/19/2024 6:02 PM, Sunil Khatri wrote: > Refactor the code so debugfs and devcoredump can reuse > the common information and avoid unnecessary copy of it. > > created a new file which would be the right place to > hold functions which will be used between sysfs, debugfs > and devcoredump. >

RE: [PATCH] drm/amdgpu: Fix truncation issues in smu_v13_0_init_microcode

2024-03-17 Thread Lazar, Lijo
[Public] Reviewed-by: Lijo Lazar Thanks, Lijo -Original Message- From: SHANMUGAM, SRINIVASAN Sent: Saturday, March 16, 2024 10:20 PM To: Koenig, Christian ; Deucher, Alexander Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN ; Lazar, Lijo Subject: [PATCH] drm/amdgpu: Fix

Re: [PATCH 1/1] drm/amdgpu: drop setting buffer funcs in sdma442

2024-03-15 Thread Lazar, Lijo
On 3/15/2024 5:45 PM, Ma, Le wrote: > [AMD Official Use Only - General] > >   >   >> -Original Message----- >> From: Lazar, Lijo <_Lijo.Lazar@amd.com_ <mailto:lijo.la...@amd.com>> >> Sent: Friday, March 15, 2024 6:14 PM >> To: Ma, Le <_L

Re: [PATCH] drm/amdgpu: trigger flr_work if reading pf2vf data failed

2024-03-15 Thread Lazar, Lijo
On 3/14/2024 10:24 PM, Zhigang Luo wrote: > if reading pf2vf data failed 5 times continuously, it means something is > wrong. Need to trigger flr_work to recover the issue. > > also use dev_err to print the error message to get which device has > issue and add warning message if waiting

Re: [PATCH 1/1] drm/amdgpu: drop setting buffer funcs in sdma442

2024-03-15 Thread Lazar, Lijo
On 3/15/2024 3:43 PM, Lazar, Lijo wrote: > > > On 3/15/2024 2:46 PM, Le Ma wrote: >> To fix the entity rq NULL issue. This setting has been moved to upper level. >> > > Need to call amdgpu_ttm_set_buffer_funcs_status(adev, true/false) in > mode-2 reset handlers

Re: [PATCH 1/1] drm/amdgpu: drop setting buffer funcs in sdma442

2024-03-15 Thread Lazar, Lijo
On 3/15/2024 2:46 PM, Le Ma wrote: > To fix the entity rq NULL issue. This setting has been moved to upper level. > Need to call amdgpu_ttm_set_buffer_funcs_status(adev, true/false) in mode-2 reset handlers as well. Thanks, Lijo > Fixes b70438004a14 ("drm/amdgpu: move buffer funcs setting

Re: [PATCH v2 1/2] drm/amd/pm: Update SMUv13.0.6 PMFW headers

2024-03-15 Thread Lazar, Lijo
On 3/15/2024 1:13 PM, Asad Kamal wrote: > Update PMFW interface headers for updated metrics table > with pcie link speed and pcie link width > > Signed-off-by: Asad Kamal Series is - Reviewed-by: Lijo Lazar Thanks, Lijo > --- >

Re: [PATCH 2/2] drm/amd/pm: Use metric table for pcie speed/width

2024-03-15 Thread Lazar, Lijo
On 3/15/2024 11:11 AM, Asad Kamal wrote: > Report pcie link speed/width using metric table in case > of one vf & if pmfw support is available, else report directly from > registers in case of pf. Skip reporting it for other cases. > > Signed-off-by: Asad Kamal > --- >

Re: [PATCH 1/9] drm/amd/pm: Add support for DPM policies

2024-03-13 Thread Lazar, Lijo
This one is missing some NULL checks. Will send a v2. Thanks, Lijo On 3/13/2024 4:32 PM, Lijo Lazar wrote: > Add support to set/get information about different DPM policies. The > support is only available on SOCs which use swsmu architecture. > > A DPM policy type may be defined with different

Re: [PATCH] drm/amdgpu: Do a basic health check before reset

2024-03-13 Thread Lazar, Lijo
On 3/14/2024 1:19 AM, Felix Kuehling wrote: > > On 2024-03-13 5:41, Lijo Lazar wrote: >> Check if the device is present in the bus before trying to recover. It >> could be that device itself is lost from the bus in some hang >> situations. >> >> Signed-off-by: Lijo Lazar >> --- >>  

Re: [PATCH] drm/amdgpu/pm: Fix the ppfeature value

2024-03-12 Thread Lazar, Lijo
On 3/13/2024 8:15 AM, Ma, Jun wrote: > > > On 3/12/2024 8:57 PM, Lazar, Lijo wrote: >> >> >> On 3/12/2024 4:29 PM, Ma Jun wrote: >>> Sometimes user may want to enable the od feature >>> by setting ppfeaturemask when loading amdgpu driver. >>&

Re: [PATCH] drm/amdgpu/pm: Fix the ppfeature value

2024-03-12 Thread Lazar, Lijo
On 3/12/2024 4:29 PM, Ma Jun wrote: > Sometimes user may want to enable the od feature > by setting ppfeaturemask when loading amdgpu driver. > However,not all Asics support this feature. > So we need to restore the ppfeature value and print > a warning info. > > Signed-off-by: Ma Jun > --- >

Re: [PATCH] drm/amdgpu: Handle duplicate BOs during process restore

2024-03-11 Thread Lazar, Lijo
On 3/8/2024 10:17 PM, Felix Kuehling wrote: > On 2024-03-08 11:22, Mukul Joshi wrote: >> In certain situations, some apps can import a BO multiple times >> (through IPC for example). To restore such processes successfully, >> we need to tell drm to ignore duplicate BOs. >> While at it, also add

Re: [PATCH v2] drm/amdgpu/pm: Fix NULL pointer dereference when get power limit

2024-03-08 Thread Lazar, Lijo
On 3/8/2024 3:21 PM, Ma Jun wrote: > Because powerplay_table initialization is skipped under > sriov case, We set default lower and upper OD value to > avoid NULL pointer issue. > > Also, It's necessary to check od capability before > using the power limit value from powerplay_table. > >

Re: [PATCH] drm/amdgpu/pm: Fix NULL pointer dereference when set/get power limit

2024-03-06 Thread Lazar, Lijo
On 3/7/2024 7:42 AM, Ma, Jun wrote: > Hi Lijo, > > On 3/6/2024 7:16 PM, Lazar, Lijo wrote: >> >> >> On 3/6/2024 3:56 PM, Ma Jun wrote: >>> Because powerplay_table initialization is skipped under >>> sriov case, We set default lower and u

  1   2   3   4   5   6   7   8   9   10   >