Re: [PATCH] drm/amdgpu/pm: Fix NULL pointer dereference when set/get power limit

2024-03-06 Thread Lazar, Lijo
On 3/6/2024 3:56 PM, Ma Jun wrote: > Because powerplay_table initialization is skipped under > sriov case, We set default lower and upper OD value to > avoid NULL pointer issue. pp_od_clk_voltage is not enabled in SRIOV (except for GC 9.4.3 one VF mode). Since the interface is not available

Re: [PATCH 1/2] drm/amdgpu: Add HDP flush to VCN v4.0.3

2024-03-05 Thread Lazar, Lijo
On 3/5/2024 2:44 PM, Christian König wrote: > Am 05.03.24 um 10:01 schrieb Lazar, Lijo: >> On 3/5/2024 2:22 PM, Christian König wrote: >>> Am 05.03.24 um 07:40 schrieb Lijo Lazar: >>>> VCN 4.0.3 cannot trigger HDP flush with RRMT enabled. Instead, trigger >&g

Re: [PATCH 2/2] drm/amdgpu: Use offsets local to VCN in VF

2024-03-05 Thread Lazar, Lijo
On 3/5/2024 2:48 PM, Christian König wrote: > Am 05.03.24 um 10:03 schrieb Lazar, Lijo: >> >> On 3/5/2024 2:24 PM, Christian König wrote: >>> >>> Am 05.03.24 um 07:40 schrieb Lijo Lazar: >>>> For VCN 4.0.3, use only the local addressing scheme wh

Re: [PATCH 2/2] drm/amdgpu: Use offsets local to VCN in VF

2024-03-05 Thread Lazar, Lijo
On 3/5/2024 2:24 PM, Christian König wrote: > > > Am 05.03.24 um 07:40 schrieb Lijo Lazar: >> For VCN 4.0.3, use only the local addressing scheme while in VF >> mode. This includes addressing scheme used for HUB offsets. >> >> Signed-off-by: Lijo Lazar >> --- >>  

Re: [PATCH 1/2] drm/amdgpu: Add HDP flush to VCN v4.0.3

2024-03-05 Thread Lazar, Lijo
On 3/5/2024 2:22 PM, Christian König wrote: > Am 05.03.24 um 07:40 schrieb Lijo Lazar: >> VCN 4.0.3 cannot trigger HDP flush with RRMT enabled. Instead, trigger >> HDP flush from host side before ringing doorbell. > > Well that won't work like that. > > The HDP flush is supposed to be emitted

Re: [PATCH] drm/amdgpu: Fix potential Spectre vulnerability in amdgpu_gfx_parse_disable_cu()

2024-03-01 Thread Lazar, Lijo
On 3/1/2024 7:52 PM, Christian König wrote: > Am 01.03.24 um 15:01 schrieb Lazar, Lijo: >> On 3/1/2024 6:15 PM, Srinivasan Shanmugam wrote: >>> The 'mask' array could be used in a way that would make the code >>> vulnerable to a Spectre attack. The issue is

Re: [PATCH] drm/amdgpu: Fix potential Spectre vulnerability in amdgpu_gfx_parse_disable_cu()

2024-03-01 Thread Lazar, Lijo
On 3/1/2024 6:15 PM, Srinivasan Shanmugam wrote: > The 'mask' array could be used in a way that would make the code > vulnerable to a Spectre attack. The issue is likely related to the fact > that the 'mask' array is being indexed using values that are derived > from user input (the 'se' and

Re: [PATCH] drm/amdgpu/pm: Fix the error of pwm1_enable setting

2024-03-01 Thread Lazar, Lijo
On 3/1/2024 1:15 PM, Ma Jun wrote: > Fix the pwm_mode value error which used for > pwm1_enable setting > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git

Re: [PATCH v2] drm/amgpu: Check return value of amdgpu_device_baco_enter/exit

2024-02-29 Thread Lazar, Lijo
On 2/29/2024 4:40 PM, Ma, Jun wrote: > Hi Lijo, > > On 2/29/2024 3:33 PM, Lazar, Lijo wrote: >> >> >> On 2/29/2024 11:49 AM, Ma Jun wrote: >>> Check return value of amdgpu_device_baco_enter/exit and print >>> warning message because these errors ma

Re: [PATCH v2] drm/amgpu: Check return value of amdgpu_device_baco_enter/exit

2024-02-28 Thread Lazar, Lijo
On 2/29/2024 11:49 AM, Ma Jun wrote: > Check return value of amdgpu_device_baco_enter/exit and print > warning message because these errors may cause runtime resume failure > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 -- > 1 file

Re: [PATCH] drm/amdgpu: Removed used parameter

2024-02-28 Thread Lazar, Lijo
On 2/27/2024 9:23 PM, Harish Kasiviswanathan wrote: > Also passing adev is misleading if BO is associated with different adev. > In this case BO is mapped to a different device > Looks like a typo in subject - unused? Thanks, Lijo > Signed-off-by: Harish Kasiviswanathan > --- >

Re: [PATCH 1/2] drm/amdgpu: Use rpm_mode flag instead of checking it again for rpm

2024-02-28 Thread Lazar, Lijo
On 2/28/2024 5:14 PM, Ma Jun wrote: > Because the rpm_mode flag is already set when the driver > is initialized, we use it directly for runtime suspend/resume > instead of checking it again > > Signed-off-by: Ma Jun Reviewed-by: Lijo Lazar Thanks, Lijo > --- >

Re: [PATCH 2/2] drm/amgpu: Check return value of amdgpu_device_baco_enter/exit

2024-02-28 Thread Lazar, Lijo
On 2/28/2024 5:14 PM, Ma Jun wrote: > Check return value of amdgpu_device_baco_enter/exit and print > warning message because these errors may cause runtime resume failure > > Signed-off-by: Ma Jun > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 ++-- > 1 file changed, 6

Re: [PATCH] drm/amdgpu: Fix multiple truncation issues in multiple driver files

2024-02-28 Thread Lazar, Lijo
On 2/27/2024 10:05 PM, Srinivasan Shanmugam wrote: > Fixes snprintf function by writing more bytes into various buffers than > they can hold. > > In several files - smu_v13_0.c, gfx_v11_0.c, gfx_v10_0.c, gfx_v9_0.c, > and amdgpu_mes.c. They were related to different directives, such as > '%s',

Re: [PATCH v3] drm/amdgpu: enable pp_od_clk_voltage for gfx 9.4.3 SRIOV

2024-02-27 Thread Lazar, Lijo
On 2/28/2024 12:30 PM, Yang Wang wrote: > v1: > enabel pp_od_clk_voltage node for gfx 9.4.3 SRIOV and BM. > > v2: > add onevf check for gfx 9.4.3 > > v3: > refine code check order to make function clearly. > > Signed-off-by: Yang Wang Reviewed-by: Lijo Lazar Thanks, Lijo > --- >

Re: [PATCH v2] drm/amdgpu: enable pp_od_clk_voltage for gfx 9.4.3 SRIOV

2024-02-27 Thread Lazar, Lijo
On 2/28/2024 12:08 PM, Yang Wang wrote: > v1: > enabel pp_od_clk_voltage node for gfx 9.4.3 SRIOV and BM. > > v2: > add onevf check for gfx 9.4.3 > > Signed-off-by: Yang Wang > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 35 +- > 1 file changed, 30 insertions(+),

Re: [PATCH] drm/amdgpu: enable pp_od_clk_voltage for gfx 9.4.3 SRIOV

2024-02-27 Thread Lazar, Lijo
On 2/28/2024 11:28 AM, Yang Wang wrote: > enabel pp_od_clk_voltage node for gfx 9.4.3 SRIOV and BM. > > Signed-off-by: Yang Wang > --- > drivers/gpu/drm/amd/pm/amdgpu_pm.c | 29 - > 1 file changed, 24 insertions(+), 5 deletions(-) > > diff --git

Re: [PATCH] drm/amd/pm: Skip reporting pcie width/speed on vfs

2024-02-27 Thread Lazar, Lijo
On 2/28/2024 9:30 AM, Asad Kamal wrote: > Skip reporting pcie link width/speed on vfs for > smu_v13_0_6 & smu_v13_0_2 > > Signed-off-by: Asad Kamal > Reviewed-by: Yang Wang Reviewed-by: Lijo Lazar Thanks, Lijo > --- > .../gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 10 ++ >

Re: [PATCH 2/2] drm/amdgpu: use new reset-affected accessors for userspace interfaces

2024-02-20 Thread Lazar, Lijo
On 2/20/2024 7:52 PM, Christian König wrote: > Am 20.02.24 um 07:32 schrieb Lazar, Lijo: >> On 2/16/2024 8:43 PM, Alex Deucher wrote: >>> Use the new reset critical section accessors for debugfs, sysfs, >>> and the INFO IOCTL to provide proper mutual exclusivity &

Re: [PATCH 2/2] drm/amdgpu: use new reset-affected accessors for userspace interfaces

2024-02-19 Thread Lazar, Lijo
On 2/16/2024 8:43 PM, Alex Deucher wrote: > Use the new reset critical section accessors for debugfs, sysfs, > and the INFO IOCTL to provide proper mutual exclusivity > to hardware with respect the GPU resets. > This looks more like a priority inversion. When the device needs reset, it

Re: [PATCH 5/5] drm/amdgpu: skip GFX FED error in page fault handling

2024-02-19 Thread Lazar, Lijo
On 2/19/2024 1:45 PM, Tao Zhou wrote: > Let kfd interrupt handler process it. > > Signed-off-by: Tao Zhou > --- > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >

Re: [PATCH] drm/amdgpu: add UTCL2 RAS poison query for gfx 9.4.3

2024-02-18 Thread Lazar, Lijo
On 2/18/2024 12:26 PM, Tao Zhou wrote: > Add help function to query and reset RAS UTCL2 poison status. > > Signed-off-by: Tao Zhou > --- > drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c

Re: [PATCH] drm/amd/pm: Allow setting max UCLK on SMU v13.0.6

2024-02-08 Thread Lazar, Lijo
Sending another one, please ignore. Thanks, Lijo On 2/9/2024 12:04 PM, Lijo Lazar wrote: > Allow reducing max UCLK in MANUAL performance level. New UCLK value > should be less than the max DPM level UCLK level value. > > Ex: > echo manual >

Re: [PATCH] drm/amd/pm: denote S to the deep sleep clock

2024-02-07 Thread Lazar, Lijo
On 2/8/2024 11:04 AM, Kenneth Feng wrote: > denote S to the deep sleep clock for the clock output on smu > v13.0.0/v13.0.7/v13.0.10 > > Signed-off-by: Kenneth Feng > --- > .../drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 27 +-- > .../drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c

Re: [PATCH] drm/amd/pm: denote S to the actual clock

2024-02-07 Thread Lazar, Lijo
On 2/7/2024 2:03 PM, Kenneth Feng wrote: > denote S to the actual clock in smu v13.0.0/v13.0.7/v13.0.10 > > Signed-off-by: Kenneth Feng > --- > drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 12 ++-- > drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c | 12 ++-- >

Re: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case

2024-01-28 Thread Lazar, Lijo
On 1/26/2024 2:30 PM, Liang, Prike wrote: > [AMD Official Use Only - General] > >> >> On 1/25/2024 8:52 AM, Prike Liang wrote: >>> In the pm abort case the gfx power rail not turn off from FCH side and >>> this will lead to the gfx reinitialized failed base on the unknown gfx >>> HW status, so

Re: [PATCH v2] drm/amdgpu: move the drm client creation behind drm device registration

2024-01-25 Thread Lazar, Lijo
On 1/25/2024 1:37 PM, Le Ma wrote: > This patch is to eliminate interrupt warning below: > > "[drm] Fence fallback timer expired on ring sdma0.0". > > An early vm pt clearing job is sent to SDMA ahead of interrupt enabled, > introduced by patch below: > > - drm/amdkfd: Export DMABufs

Re: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case

2024-01-25 Thread Lazar, Lijo
On 1/25/2024 8:52 AM, Prike Liang wrote: > In the pm abort case the gfx power rail not turn off from FCH side and > this will lead to the gfx reinitialized failed base on the unknown gfx > HW status, so let's reset the gpu to a known good power state. > >From the description, this an APU only

Re: [PATCH v3] drm/amdgpu/pm: Use macro definitions in the smu IH process function

2024-01-25 Thread Lazar, Lijo
On 1/25/2024 2:20 PM, Ma Jun wrote: > Replace the hard-coded numbers with macro definition > > Signed-off-by: Ma Jun Reviewed-by: Lijo Lazar Thanks, Lijo > --- > v3: > - Add new SMU_IH_INTERRUPT_* macros for smu, keeping the original > macro definitions in sync with pmfw (kevin) > --- >

Re: [PATCH 1/1] drm/amdgpu: move the drm client creation behind drm device registration

2024-01-24 Thread Lazar, Lijo
On 1/25/2024 11:48 AM, Le Ma wrote: > This patch is to eliminate interrupt warning below: > > "[drm] Fence fallback timer expired on ring sdma0.0". > > An early vm pt clearing job is sent to SDMA ahead of interrupt enabled, > introduced by patch below: > > - drm/amdkfd: Export DMABufs

Re: [PATCH 1/1] drm/amdgpu: enable interrupt prior to kfd device_init

2024-01-24 Thread Lazar, Lijo
On 1/24/2024 2:28 PM, Le Ma wrote: > This patch is to eliminate interrupt warning below: > > "[drm] Fence fallback timer expired on ring sdma0.0". > > An early vm pt clearing job is sent to SDMA ahead of interrupt enabled, > introduced by patch below: > > - drm/amdkfd: Export DMABufs

Re: [PATCH] drm/amdgpu: Fix driver uninstallation problem

2024-01-23 Thread Lazar, Lijo
Mukul posted a patch for this already. "drm/amdgpu: Fix module unload hang with RAS enabled" Thanks, Lijo On 1/24/2024 9:09 AM, YiPeng Chai wrote: > The following is the error message: > [ 484.495995] task:rmmod state:D stack:0 pid: 2195 ppid: 2194 > flags:0x4002 >

Re: [PATCH v2 2/2] drm/amdgpu/pm: Use macro definitions in the smu IH process function

2024-01-23 Thread Lazar, Lijo
On 1/23/2024 1:43 PM, Ma Jun wrote: > Replace the hard-coded numbers with macro definition > > Signed-off-by: Ma Jun Series is Reviewed-by: Lijo Lazar Thanks, Lijo > --- > .../pm/swsmu/inc/pmfw_if/smu13_driver_if_v13_0_0.h | 11 --- >

Re: [PATCH] drm/amdgpu: covert some variable sized arrays to [] style

2024-01-23 Thread Lazar, Lijo
On 1/22/2024 8:30 PM, Alex Deucher wrote: > Replace [1] with []. Silences UBSAN warnings. > > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3107 > Signed-off-by: Alex Deucher typo => covert-> convert With the typo fixed in the subject - Reviewed-by: Lijo Lazar Thanks,

Re: [PATCH] drm/amdgpu: check flag ring->no_scheduler before usage

2024-01-23 Thread Lazar, Lijo
On 1/21/2024 5:49 AM, vitaly.pros...@amd.com wrote: > From: Vitaly Prosyak > >The issue started to appear after the following commit > 11b3b9f461c5c4f700f6c8da202fcc2fd6418e1f (scheduler to variable number > of run-queues). The scheduler flag ready (ring->sched.ready) could not be >

Re: [PATCH] drm/amdgpu: Fix 'adev->gfx.rlc_fw' from request_firmware() not released in 'gfx_v10_0_init_microcode()'

2024-01-23 Thread Lazar, Lijo
On 1/23/2024 1:38 PM, Srinivasan Shanmugam wrote: > 'adev->gfx.rlc_fw' may not be released before end of > gfx_v10_0_init_microcode() function. > > Using the function release_firmware() to release adev->gfx.rlc_fw. > > Fixes the below: > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4046

Re: [PATCH 2/2] drm/amdgpu/pm: Use macro definitions in the smu IH process function

2024-01-22 Thread Lazar, Lijo
On 1/22/2024 2:12 PM, Ma Jun wrote: > Replace the hard-coded numbers with macro definition > > Signed-off-by: Ma Jun > --- > .../pm/swsmu/inc/pmfw_if/smu13_driver_if_v13_0_0.h | 11 +-- > .../pm/swsmu/inc/pmfw_if/smu13_driver_if_v13_0_7.h | 11 --- >

RE: [PATCH] drm/amdgpu: Fix null pointer dereference

2024-01-22 Thread Lazar, Lijo
[AMD Official Use Only - General] Reviewed-by: Lijo Lazar Thanks, Lijo -Original Message- From: Zhang, Hawking Sent: Monday, January 22, 2024 3:27 PM To: amd-gfx@lists.freedesktop.org; Lazar, Lijo ; Deucher, Alexander ; Ma, Le Cc: Zhang, Hawking Subject: [PATCH] drm/amdgpu: Fix

Re: [PATCH v2] drm/amdgpu/pm: Fix the power source flag error

2024-01-18 Thread Lazar, Lijo
On 1/19/2024 7:24 AM, Ma, Jun wrote: Hi Lijo, On 1/18/2024 5:24 PM, Lazar, Lijo wrote: On 1/18/2024 2:31 PM, Ma, Jun wrote: On 1/18/2024 4:38 PM, Lazar, Lijo wrote: On 1/18/2024 12:57 PM, Ma Jun wrote: The power source flag should be updated when [1] System receives an interrupt

Re: [PATCH] drm/amd/pm: udpate smu v13.0.6 message permission

2024-01-18 Thread Lazar, Lijo
On 1/19/2024 9:17 AM, Yang Wang wrote: update smu v13.0.6 message to allow guest driver set gfx clock. Signed-off-by: Yang Wang Reviewed-by: Lijo Lazar Thanks, Lijo --- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff

Re: [PATCH v2] drm/amdgpu/pm: Fix the power source flag error

2024-01-18 Thread Lazar, Lijo
On 1/18/2024 2:31 PM, Ma, Jun wrote: On 1/18/2024 4:38 PM, Lazar, Lijo wrote: On 1/18/2024 12:57 PM, Ma Jun wrote: The power source flag should be updated when [1] System receives an interrupt indicating that the power source has changed. [2] System resumes from suspend or runtime suspend

Re: [PATCH v2] drm/amdgpu/pm: Fix the power source flag error

2024-01-18 Thread Lazar, Lijo
On 1/18/2024 12:57 PM, Ma Jun wrote: The power source flag should be updated when [1] System receives an interrupt indicating that the power source has changed. [2] System resumes from suspend or runtime suspend Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 13

Re: [PATCH v2] drm/amd/pm: enable amdgpu smu send message log

2024-01-17 Thread Lazar, Lijo
On 1/18/2024 11:07 AM, Yang Wang wrote: From: Yang Wang v1: enable amdgpu smu driver message log. v2: add smu/pmfw response value into debug log. Signed-off-by: Yang Wang Reviewed-by: Lijo Lazar Thanks, Lijo --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 9 - 1 file changed,

Re: [PATCH] drm/amd/pm: enable amdgpu smu send message log

2024-01-17 Thread Lazar, Lijo
On 1/18/2024 8:56 AM, Yang Wang wrote: From: Yang Wang enable amdgpu smu driver message log. Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c

Re: [PATCH] drm/amdgpu/pm: Fix the power source flag error

2024-01-17 Thread Lazar, Lijo
On 1/18/2024 7:54 AM, Ma, Jun wrote: Hi Lijo, On 1/17/2024 5:41 PM, Lazar, Lijo wrote: On 1/17/2024 2:22 PM, Ma Jun wrote: The power source flag should be updated when [1] System receives an interrupt indicating that the power source has changed. [2] System resumes from suspend or runtime

Re: [PATCH] drm/amdgpu/pm: Fix the power source flag error

2024-01-17 Thread Lazar, Lijo
On 1/17/2024 2:22 PM, Ma Jun wrote: The power source flag should be updated when [1] System receives an interrupt indicating that the power source has changed. [2] System resumes from suspend or runtime suspend Signed-off-by: Ma Jun --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 24

Re: [PATCH] drm/amdgpu: fix UBSAN array-index-out-of-bounds for ras_block_string[]

2024-01-16 Thread Lazar, Lijo
On 1/16/2024 4:32 PM, Yang Wang wrote: fix array index out of bounds issue for ras_block_string[] array. Fixes: 2e3675fe4e3ee ("drm/amdgpu: Align ras block enum with firmware") Signed-off-by: Yang Wang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 - 1 file changed, 4 insertions(+),

Re: [PATCH 2/3 v2] drm/amdgpu: add debug flag to place fw bo on vram for frontdoor loading

2024-01-09 Thread Lazar, Lijo
On 1/9/2024 6:30 PM, Le Ma wrote: Use debug_mask=0x8 param to help isolating data path issues on new systems in early phase. v2: rename the flag for explicitness (lijo) Signed-off-by: Le Ma Series is Reviewed-by: Lijo Lazar Thanks, Lijo --- drivers/gpu/drm/amd/amdgpu/amdgpu.h

Re: [PATCH 2/3] drm/amdgpu: add debug flag to change fw bo placement for frontdoor loading

2024-01-09 Thread Lazar, Lijo
On 1/9/2024 3:43 PM, Le Ma wrote: se debug_mask=0x8 param to help isolating data path issues on new systems in early phase. Signed-off-by: Le Ma --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++

Re: [PATCH] Revert "drm/amdgpu: Update irq disable flow during unload"

2024-01-08 Thread Lazar, Lijo
On 1/8/2024 4:27 PM, Asad Kamal wrote: Re-evaluate the original workaround: commit f5c7e7797060 ("drm/amdgpu: Adjust removal control flow for smu v13_0_2") This reverts commit 2e8e792e6a51e8cb7f5f96148146b6525dbb9cef. Signed-off-by: Asad Kamal You may reword the commmit message as 'revert

Re: [PATCH] drm/amdgpu: Update irq disable flow during unload

2024-01-08 Thread Lazar, Lijo
On 1/8/2024 1:51 PM, Christian König wrote: Am 08.01.24 um 09:13 schrieb Kamal, Asad: [AMD Official Use Only - General] Hi Christian, Thank you for the comment. This is not normal reset, it is reset done during unload for smu v_13_0_2. Yeah, but this doesn't explain the rational for this.

Re: [PATCH] drm/amdgpu: Update irq disable flow during unload

2024-01-07 Thread Lazar, Lijo
On 1/5/2024 8:51 PM, Asad Kamal wrote: In certain special cases, e.g device reset before module unload, irq gets disabled as part of reset sequence and won't get enabled back. Add special check to cover such scenarios Signed-off-by: Asad Kamal Suggested-by: Lijo Lazar Please also add the

Re: [PATCH] drm/amd/pm: Add mem_busy_percent for GCv9.4.3 apu

2024-01-01 Thread Lazar, Lijo
On 12/22/2023 10:52 PM, Asad Kamal wrote: Expose sysfs entry mem_busy_percent for GC version 9.4.3 APU system Signed-off-by: Asad Kamal Reviewed-by: Lijo Lazar Thanks, Lijo --- drivers/gpu/drm/amd/pm/amdgpu_pm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git

Re: [PATCH Review V3 1/1] drm/amdgpu: Fix ecc irq enable/disable unpaired

2023-12-22 Thread Lazar, Lijo
On 12/21/2023 11:35 AM, Stanley.Yang wrote: The ecc_irq is disabled while GPU mode2 reset suspending process, but not be enabled during GPU mode2 reset resume process. Changed from V1: only do sdma/gfx ras_late_init in aldebaran_mode2_restore_ip delete amdgpu_ras_late_resume

Re: [PATCH v2] drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()'

2023-12-22 Thread Lazar, Lijo
On 12/21/2023 9:13 PM, Srinivasan Shanmugam wrote: In function 'amdgpu_device_need_post(struct amdgpu_device *adev)' - 'adev->pm.fw' may not be released before return. Using the function release_firmware() to release adev->pm.fw. Thus fixing the below:

Re: [PATCH] drm/amdkfd: Confirm list is non-empty before utilizing list_first_entry in kfd_topology.c

2023-12-22 Thread Lazar, Lijo
On 12/22/2023 9:44 AM, Srinivasan Shanmugam wrote: Before using list_first_entry, make sure to check that list is not empty. Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1347 kfd_create_indirect_link_prop() warn: can 'gpu_link' even be NULL?

Re: [PATCH] drm/amdkfd: Use logical AND operator (&&) instead of bitwise AND (&) with a bool operand

2023-12-21 Thread Lazar, Lijo
On 12/21/2023 8:12 AM, Srinivasan Shanmugam wrote: Doing a bitwise AND between a bool and an int is generally not a good idea. The bool will be promoted to an int with value 0 or 1, the int is generally regarded as true with a non-zero value, thus ANDing them using bitwise has the potential to

Re: [PATCH v3 3/3] drm/amd: Retry delayed work handler if sensor is busy

2023-12-18 Thread Lazar, Lijo
On 12/16/2023 1:25 AM, Mario Limonciello wrote: The SW CTF delayed work handler triggers a shutdown if a sensor read failed for any reason. The specific circumstance of a busy sensor should be retried however to ensure that a good value can be returned. Signed-off-by: Mario Limonciello ---

Re: [PATCH] drm/amd: Make SW CTF handler cope with different read_sensor() results

2023-12-15 Thread Lazar, Lijo
On 12/14/2023 10:15 PM, Mario Limonciello wrote: The SW CTF handler assumes that the read_sensor() call always succeeds and has updated `hotspot_tmp`, but this may not be guaranteed. For example some of the read_sensor() callbacks will return 0 when a RAS interrupt is triggered in which case

Re: [PATCH] drm/amd/pm: fix pp_*clk_od typo

2023-12-06 Thread Lazar, Lijo
On 12/7/2023 1:09 AM, Dmitrii Galantsev wrote: Fix pp_dpm_sclk_od and pp_dpm_mclk_od typos. Those were defined as pp_*clk_od but used as pp_dpm_*clk_od instead. This change removes the _dpm part. Signed-off-by: Dmitrii Galantsev Reviewed-by: Lijo Lazar Add below tag also before

Re: [PATCH] drm/amdgpu: Use the right method to get IP version

2023-12-06 Thread Lazar, Lijo
On 12/6/2023 7:40 PM, Alex Deucher wrote: On Wed, Dec 6, 2023 at 7:57 AM Lijo Lazar wrote: Replace direct usage of adev->ip_versions with amdgpu_ip_version. Signed-off-by: Lijo Lazar Reviewed-by: Alex Deucher I see two more instances of direct use. Will send a v2. Thanks, Lijo

Re: [PATCH] drm/amdgpu: Add a new module param to disable d3cold

2023-11-30 Thread Lazar, Lijo
On 11/30/2023 4:17 PM, Ma, Jun wrote: Hi Lijo, On 11/30/2023 5:18 PM, Lazar, Lijo wrote: On 11/30/2023 11:59 AM, Ma, Jun wrote: Hi Alex, On 11/30/2023 12:39 AM, Alex Deucher wrote: On Wed, Nov 29, 2023 at 11:37 AM Ma Jun wrote: Some platforms can't resume from d3cold state, So add

Re: [PATCH] drm/amdgpu: Add a new module param to disable d3cold

2023-11-30 Thread Lazar, Lijo
On 11/30/2023 11:59 AM, Ma, Jun wrote: Hi Alex, On 11/30/2023 12:39 AM, Alex Deucher wrote: On Wed, Nov 29, 2023 at 11:37 AM Ma Jun wrote: Some platforms can't resume from d3cold state, So add a new module parameter to disable d3cold state for debugging purpose or workaround. Doesn't

Re: [PATCH] drm/amdgpu: fix miss to create mca debugfs node issue

2023-11-29 Thread Lazar, Lijo
On 11/30/2023 10:39 AM, Yang Wang wrote: Use amdgpu_ip_version() helper function to check ip version. The ip verison contains other information, use the helper function to avoid reading wrong value. Signed-off-by: Yang Wang May refine the subject to "Fix missing mca debugfs node"

Re: [PATCH 3/3] drm/amd: Drop calls for checking "support" for BACO/BOCO/PX

2023-11-29 Thread Lazar, Lijo
On 11/29/2023 12:22 AM, Mario Limonciello wrote: As the module parameter can be used to control behavior, all parts of the driver should obey what has been programmed by user or detected by auto mode rather than what "can" be supported. This is also not correct. You can very well disable

Re: [PATCH 2/3] drm/amd: Introduce new enum for BAMACO

2023-11-29 Thread Lazar, Lijo
On 11/29/2023 12:22 AM, Mario Limonciello wrote: Rather than plumbing module parameter deep into IP declare BAMACO runpm mode at amdgpu_driver_set_runtime_pm_mode() and then detect this mode in consumers. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

Re: [PATCH 1/3] drm/amd: Fix handling of amdgpu.runpm on systems with BOCO

2023-11-29 Thread Lazar, Lijo
On 11/29/2023 12:22 AM, Mario Limonciello wrote: On products that support both BOCO and BACO it should be possible to override the BOCO detection and force BACO by amdgpu.runpm=1 but this doesn't work today. Adjust the logic used in amdgpu_driver_load_kms() to make sure that module

Re: PSP_VMBX_POLLING_LIMIT too big

2023-11-28 Thread Lazar, Lijo
On 11/28/2023 9:51 PM, Mario Limonciello wrote: Hi, In amd-staging-drm-next 46fe6312082c ("drm/amdgpu: update retry times for psp BL wait") and upstream a11156ff6f41 ("drm/amdgpu: update retry times for psp BL wait") the number of loops for psp_v13_0_wait_for_bootloader() to try again

Re: [PATCH] drm/amdgpu: Fix uninitialized return value

2023-11-28 Thread Lazar, Lijo
On 11/28/2023 3:07 PM, Christian König wrote: Am 27.11.23 um 22:55 schrieb Alex Deucher: On Mon, Nov 27, 2023 at 2:22 PM Christian König wrote: Am 27.11.23 um 19:29 schrieb Lijo Lazar: The return value is uniinitialized if ras context is NULL. Fixes: 0f4c8faa043c (drm/amdgpu: Move mca

Re: [PATCH v3] drm/amdkfd: Run restore_workers on freezable WQs

2023-11-24 Thread Lazar, Lijo
On 11/24/2023 4:25 AM, Felix Kuehling wrote: Make restore workers freezable so we don't have to explicitly flush them in suspend and GPU reset code paths, and we don't accidentally try to restore BOs while the GPU is suspended. Not having to flush restore_work also helps avoid lock/fence

Re: [PATCH] drm/amdkfd: Copy HW exception data to user event

2023-11-16 Thread Lazar, Lijo
On 11/17/2023 10:50 AM, David Yat Sin wrote: Fixes issue where user events of type KFD_EVENT_TYPE_HW_EXCEPTION do not have valid data Signed-off-by: David Yat Sin Reviewed-by: Lijo Lazar Thanks, Lijo --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 4 1 file changed, 4

Re: [PATCH v3] drm/amdgpu: Move mca debug mode decision to ras

2023-11-16 Thread Lazar, Lijo
On 11/10/2023 1:25 PM, Lijo Lazar wrote: Refactor code such that ras block decides the default mca debug mode, and not swsmu block. By default mca debug mode is set to false. Signed-off-by: Lijo Lazar --- v3: Default mca debug mode is set to false v2: Set mca debug mode early before ras

Re: [PATCH v3 7/7] PCI: Exclude PCIe ports used for virtual links in pcie_bandwidth_available()

2023-11-15 Thread Lazar, Lijo
On 11/16/2023 2:39 AM, Mario Limonciello wrote: On 11/15/2023 11:04, Mario Limonciello wrote: On 11/14/2023 21:23, Lazar, Lijo wrote: On 11/15/2023 1:37 AM, Mario Limonciello wrote: The USB4 spec specifies that PCIe ports that are used for tunneling PCIe traffic over USB4 fabric

Re: [PATCH v2 1/2] drm/amd: Use the first non-dGPU PCI device for BW limits

2023-11-15 Thread Lazar, Lijo
On 11/11/2023 4:04 AM, Mario Limonciello wrote: When bandwidth limits are looked up using pcie_bandwidth_available() virtual links such as USB4 are analyzed which might not represent the real speed. Furthermore devices may change speeds autonomously which may introduce conditional variation

Re: [PATCH] drm/amd/pm: Fix the error of dpm_enabled flag

2023-11-14 Thread Lazar, Lijo
On 11/15/2023 10:48 AM, Ma Jun wrote: Set dpm_enabled flag to false only when dpms is successfully disabled. This a software flag and we block many services based on this flag status. I think the purpose of setting it early is to block other service calls which could come in between. Did

Re: [PATCH v3 7/7] PCI: Exclude PCIe ports used for virtual links in pcie_bandwidth_available()

2023-11-14 Thread Lazar, Lijo
On 11/15/2023 1:37 AM, Mario Limonciello wrote: The USB4 spec specifies that PCIe ports that are used for tunneling PCIe traffic over USB4 fabric will be hardcoded to advertise 2.5GT/s and behave as a PCIe Gen1 device. The actual performance of these ports is controlled by the fabric

Re: [PATCH 1/2] drm/amd/pm: Update metric table for smu v13_0_6

2023-11-14 Thread Lazar, Lijo
On 11/14/2023 2:25 PM, Asad Kamal wrote: Update pmfw metric table to include pcie instantaneous bandwidth & pcie error counters Signed-off-by: Asad Kamal Reviewed-by: Le Ma Series is - Reviewed-by: Lijo Lazar Thanks, Lijo ---

Re: [PATCH] drm/amdgpu: Skip execution of pending reset jobs

2023-11-10 Thread Lazar, Lijo
On 11/10/2023 8:18 PM, Christian König wrote: Am 09.11.23 um 08:38 schrieb Lijo Lazar: cancel_work is not backported to all custom kernels. Well this is pretty clear NAK to pushing this upstream. We absolutely can't add workaround for older kernels. You could keep this in the backported

Re: [PATCH] drm/amdgpu: Skip execution of pending reset jobs

2023-11-10 Thread Lazar, Lijo
On 11/9/2023 1:08 PM, Lijo Lazar wrote: cancel_work is not backported to all custom kernels. Add a workaround to skip execution of already queued recovery jobs, if the device is already reset. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +

Re: [PATCH] drm/amd/pm: make power values signed

2023-11-10 Thread Lazar, Lijo
On 11/9/2023 2:11 PM, José Pekkarinen wrote: The following patch will convert the power values returned by amdgpu_hwmon_get_power to signed, fixing the following warnings reported by coccinelle: drivers/gpu/drm/amd/pm/amdgpu_pm.c:2801:5-8: WARNING: Unsigned expression compared with zero:

Re: [PATCH 3/3] drm/amdgpu: add new INFO IOCTL query for input power

2023-11-10 Thread Lazar, Lijo
On 11/10/2023 3:44 AM, Alex Deucher wrote: Some chips provide both average and input power. Previously we just exposed average power, add a new query for input power. Input looks like a misnomer (not the supply side, but the power consumed). Better to rename to instantaneous or current

Re: [PATCH 1/4] drm/amdgpu: Add flag to enable indirect RLCG access for gfx v9.4.3

2023-11-08 Thread Lazar, Lijo
On 11/3/2023 8:26 PM, Victor Lu wrote: The "rlcg_reg_access_supported" flag is missing. Add it back in. Signed-off-by: Victor Lu Series is Reviewed-by: Lijo Lazar Thanks, Lijo --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 1 + 1 file changed, 1 insertion(+) diff --git

Re: [PATCH v3] drm: amd: Resolve Sphinx unexpected indentation warning

2023-11-07 Thread Lazar, Lijo
On 11/7/2023 9:58 PM, Hunter Chasens wrote: Resolves Sphinx unexpected indentation warning when compiling documentation (e.g. `make htmldocs`). Replaces tabs with spaces and adds a literal block to keep vertical formatting of the example power state list. Signed-off-by: Hunter Chasens

Re: [PATCH v2] drm: amd: Resolve Sphinx unexpected indentation warning

2023-11-06 Thread Lazar, Lijo
On 11/7/2023 1:47 AM, Hunter Chasens wrote: Resolves Sphinx unexpected indentation warning when compiling documentation (e.g. `make htmldocs`). Replaces tabs with spaces and adds a literal block to keep vertical formatting of the example power state list. Signed-off-by: Hunter Chasens ---

Re: [PATCH v1] drm: amd: Resolve Sphinx unexpected indentation warning

2023-11-06 Thread Lazar, Lijo
On 11/6/2023 2:30 AM, Hunter Chasens wrote: Resolves Sphinx unexpected indentation warning when compiling documentation (e.g. `make htmldocs`). Replaces tabs with spaces and adds a literal block to keep vertical formatting of the example power state list. Signed-off-by: Hunter Chasens

Re: [PATCH v2 8/9] PCI: Exclude PCIe ports used for tunneling in pcie_bandwidth_available()

2023-11-04 Thread Lazar, Lijo
On 11/4/2023 12:37 AM, Mario Limonciello wrote: The USB4 spec specifies that PCIe ports that are used for tunneling PCIe traffic over USB4 fabric will be hardcoded to advertise 2.5GT/s and behave as a PCIe Gen1 device. The actual performance of these ports is controlled by the fabric

Re: [PATCH 3/4] drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v4)

2023-11-03 Thread Lazar, Lijo
On 11/2/2023 8:34 PM, Victor Lu wrote: amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0. Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC and amdgpu_device_xcc_wreg/rreg to to use the new xcc_id parameter. Using amdgpu_sriov_runtime to determine whether to access

Re: [PATCH] drm/amdgpu: correct amdgpu ip block rev info

2023-10-30 Thread Lazar, Lijo
On 10/31/2023 8:20 AM, Yang Wang wrote: correct following amdgpu ip block version information: - gfx_v9_4_3 - sdma_v4_4_2 Signed-off-by: Yang Wang Reviewed-by: Lijo Lazar Thanks, Lijo --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2

Re: [PATCH] drm/amdgpu: remove unused macro HW_REV

2023-10-30 Thread Lazar, Lijo
On 10/31/2023 7:42 AM, Yang Wang wrote: remove unused macro HW_REV Signed-off-by: Yang Wang Reviewed-by: Lijo Lazar Thanks, Lijo --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h

Re: [PATCH] drm/amdgpu: Add xcc_inst param to amdgpu_virt_kiq_reg_write_reg_wait (v3)

2023-10-30 Thread Lazar, Lijo
On 10/28/2023 1:41 AM, Victor Lu wrote: amdgpu_virt_kiq_reg_write_reg_wait is hardcoded to use MEC engine 0. Add xcc_inst as a parameter to allow it to use different MEC engines. v3: use first xcc for MMHUB in gmc_v9_0_flush_gpu_tlb v2: rebase Signed-off-by: Victor Lu ---

Re: [PATCH] drm/amdgpu: Add xcc instance parameter to *REG32_SOC15_IP_NO_KIQ (v3)

2023-10-30 Thread Lazar, Lijo
On 10/28/2023 1:36 AM, Victor Lu wrote: The WREG32/RREG32_SOC15_IP_NO_KIQ call is using XCC0's RLCG interface when programming other XCCs. Add xcc instance parameter to them. v3: xcc not needed for MMMHUB v2: rebase Signed-off-by: Victor Lu --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c

Re: [PATCH 3/5] drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v3)

2023-10-27 Thread Lazar, Lijo
On 10/26/2023 2:22 AM, Victor Lu wrote: amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0. Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC and amdgpu_device_xcc_wreg/rreg to to use the new xcc_id parameter. v3: use W/RREG32_XCC to handle non-kiq case v2: define

Re: [PATCH 2/5] drm/amdgpu: Add xcc instance parameter to *REG32_SOC15_IP_NO_KIQ (v2)

2023-10-27 Thread Lazar, Lijo
On 10/26/2023 2:22 AM, Victor Lu wrote: The WREG32/RREG32_SOC15_IP_NO_KIQ call is using XCC0's RLCG interface when programming other XCCs. Add xcc instance parameter to them. v2: rebase Signed-off-by: Victor Lu --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 16

Re: [PATCH 3/5] drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v3)

2023-10-26 Thread Lazar, Lijo
On 10/26/2023 2:22 AM, Victor Lu wrote: amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0. Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC and amdgpu_device_xcc_wreg/rreg to to use the new xcc_id parameter. v3: use W/RREG32_XCC to handle non-kiq case v2: define

Re: [PATCH 5/6] drm/amdgpu: Add new macro WREG32_RLC_XCC (v2)

2023-10-25 Thread Lazar, Lijo
On 10/4/2023 6:26 AM, Victor Lu wrote: WREG32_RLC does not specify the correct XCC so the RLCG interface does not work. Define WREG32_RLC_XCC to be like WREG32_RLC but include a parameter to specify the XCC. v2: Add new macro WREG32_RLC_XCC instead of modifying exiting WREG32_RLC macro

Re: [PATCH] drm/amdgpu: Use pcie domain of xcc acpi objects

2023-10-23 Thread Lazar, Lijo
[AMD Official Use Only - General] Thanks, Lijo From: amd-gfx on behalf of Lijo Lazar Sent: Friday, October 20, 2023 8:44:22 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kasiviswanathan, Harish ; Zhang, Hawking Subject: [PATCH] drm/amdgpu:

Re: [PATCH 1/2] drm/amdgpu : Add hive ras recovery check

2023-10-16 Thread Lazar, Lijo
On 10/17/2023 9:58 AM, Asad Kamal wrote: Add hive ras recovery check and propagate fatal error to aids of all sockets in the hive May be reword it as 'If one of the devices in the hive detects a fatal error, need to send ras recovery reset message to PMFW of all devices in the hive. For

RE: [PATCH] drm/amdgpu: Restrict PSP OS version read to PF

2023-10-11 Thread Lazar, Lijo
[AMD Official Use Only - General] Please ignore this patch as tOS is not loaded on VF and hence the path is not taken. Thanks, Lijo -Original Message- From: amd-gfx On Behalf Of Lijo Lazar Sent: Thursday, October 12, 2023 11:21 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher,

Re: [PATCH 3/3] drm/amd/pm: Add P2S tables for SMU v13.0.6

2023-10-11 Thread Lazar, Lijo
rom: Lazar, Lijo Sent: Wednesday, October 11, 2023 23:32 To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Deucher, Alexander Subject: [PATCH 3/3] drm/amd/pm: Add P2S tables for SMU v13.0.6 Add P2S table load support on SMU v13.0.6 ASICs. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/

Re: [PATCH v2] drm/amdgpu: Expose ras version & schema info

2023-10-11 Thread Lazar, Lijo
On 10/11/2023 2:55 PM, Asad Kamal wrote: Expose ras table version & schema info to sysfs v2: Updated schema to get poison support info from ras context, removed asic specific checks Signed-off-by: Asad Kamal One nit inline. With/without that change, Reviewed-by: Lijo Lazar ---

<    1   2   3   4   5   6   7   8   9   10   >