[PATCH] drm/amdgpu: Fix gfx10 kiq ring_lock warning on full reset

2024-07-22 Thread Jesse Zhang
Fix warning about kiq ring.
Unlock kiq ring when queue reset fails.

[  285.999224] amdgpu :03:00.0: amdgpu: GPU reset begin!
[  312.018425] watchdog: BUG: soft lockup - CPU#11 stuck for 26s! 
[kworker/u64:2:878]
[  312.018428] Modules linked in: amdgpu(E) amdxcp drm_exec gpu_sched drm_buddy 
drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core 
drm_kms_helper i2c_algo_bit rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace 
netfs xt_conntrack nft_chain_nat r8153_ecm cdc_ether usbnet xt_MASQUERADE 
nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
xfrm_user xfrm_algo xt_multiport xt_addrtype nft_compat nf_tables br_netfilter 
libcrc32c nfnetlink bridge stp llc r8152 mii joydev input_leds overlay 
snd_hda_codec_hdmi edac_mce_amd snd_hda_intel snd_intel_dspcfg 
snd_intel_sdw_acpi kvm_amd snd_hda_codec snd_hda_core snd_hwdep kvm hid_generic 
snd_pcm crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 
sha1_ssse3 aesni_intel snd_seq_midi snd_seq_midi_event snd_rawmidi crypto_simd 
usbhid cryptd hid snd_seq snd_pci_acp5x snd_seq_device snd_timer 
snd_rn_pci_acp3x rapl snd_acp_config snd_soc_acpi snd ccp snd_pci_acp3x 
wmi_bmof soundcore k10temp mac_hid sunrpc binfmt_misc sch_fq_codel msr 
parport_pc
[  312.018466]  ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 
ucsi_ccg typec_ucsi typec nvme crc32_pclmul nvme_core xhci_pci 
i2c_designware_pci i2c_piix4 xhci_pci_renesas i2c_ccgx_ucsi video wmi
[  312.018475] CPU: 11 PID: 878 Comm: kworker/u64:2 Tainted: GE 
 6.8.0+ #171
[  312.018477] Hardware name: AMD Splinter/Splinter-GNR, BIOS WS54117N_140 
01/16/2024
[  312.018478] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[  312.018485] RIP: 0010:native_queued_spin_lock_slowpath+0x88/0x300
[  312.018490] Code: 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 41 8b 04 24 30 e4 09 
d0 a9 00 01 ff ff 75 5e 85 c0 74 14 41 0f b6 04 24 84 c0 74 0b f3 90 <41> 0f b6 
04 24 84 c0 75 f5 b8 01 00 00 00 66 41 89 04 24 5b 41 5c
[  312.018492] RSP: 0018:a327c0de7b80 EFLAGS: 0202
[  312.018493] RAX: 0001 RBX:  RCX: 
[  312.018494] RDX:  RSI:  RDI: 8ab913e16cf8
[  312.018495] RBP: a327c0de7ba8 R08:  R09: fa400704
[  312.018495] R10: a327c0de7bb8 R11: 0040 R12: 8ab913e16cf8
[  312.018496] R13: 8ab913e0 R14: 8ab913e0 R15: 8ab913e0
[  312.018497] FS:  () GS:8ab9956c() 
knlGS:
[  312.018498] CS:  0010 DS:  ES:  CR0: 80050033
[  312.018498] CR2: 7f44b24d319c CR3: 00023b83c000 CR4: 00750ef0
[  312.018499] PKRU: 5554
[  312.018500] Call Trace:
[  312.018501]  
[  312.018504]  ? show_regs+0x6c/0x80
[  312.018508]  ? watchdog_timer_fn+0x206/0x290
[  312.018511]  ? __pfx_watchdog_timer_fn+0x10/0x10
[  312.018513]  ? __hrtimer_run_queues+0xc8/0x220
[  312.018517]  ? hrtimer_interrupt+0x10d/0x250
[  312.018519]  ? __sysvec_apic_timer_interrupt+0x51/0x130
[  312.018522]  ? sysvec_apic_timer_interrupt+0x7f/0x90
[  312.018525]  
[  312.018525]  
[  312.018526]  ? asm_sysvec_apic_timer_interrupt+0x1f/0x30
[  312.018529]  ? native_queued_spin_lock_slowpath+0x88/0x300
[  312.018530]  _raw_spin_lock+0x2d/0x40
[  312.018532]  amdgpu_gfx_disable_kgq+0x6f/0x1d0 [amdgpu]
[  312.018646]  gfx_v10_0_hw_fini+0x111/0x130 [amdgpu]
[  312.018742]  gfx_v10_0_suspend+0x12/0x20 [amdgpu]
[  312.018832]  amdgpu_device_ip_suspend_phase2+0x244/0x470 [amdgpu]
[  312.018909]  amdgpu_device_ip_suspend+0x4b/0x90 [amdgpu]
[  312.018989]  amdgpu_device_pre_asic_reset+0xda/0x4b0 [amdgpu]
[  312.019068]  amdgpu_device_gpu_recover+0x319/0xe20 [amdgpu]
[  312.019147]  amdgpu_job_timedout+0x177/0x280 [amdgpu]
[  312.019266]  drm_sched_job_timedout+0x7c/0x100 [gpu_sched]
[  312.019269]  process_scheduled_works+0x9a/0x3a0
[  312.019272]  ? __pfx_worker_thread+0x10/0x10
[  312.019273]  worker_thread+0x15f/0x2d0
[  312.019275]  ? __pfx_worker_thread+0x10/0x10
[  312.019276]  kthread+0xfb/0x130
[  312.019277]  ? __pfx_kthread+0x10/0x10
[  312.019278]  ret_from_fork+0x3d/0x60
[  312.019280]  ? __pfx_kthread+0x10/0x10
[  312.019281]  ret_from_fork_asm+0x1b/0x30
[  312.019284]  

Signed-off-by: Vitaly Prosyak 
Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index fde11159270c..59024fbf8c22 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -9478,6 +9478,7 @@ static int gfx_v10_0_reset_compute_ring(struct 
amdgpu_ring *ring,
   0, 0);
amdgpu_ring_commit(kiq_ring);
 
+   spin_unlock_irqrestore(>ring_lock, flags);
r = amdgpu_ring_test_ring(kiq_ring);
if (r)
   

[PATCH 4/12 V2] drm/amdgpu: remove dead code in atom_get_src_int

2024-06-05 Thread Jesse Zhang
Since the range of align is 0~7, the expression is: align = (attr >> 3) & 7.
In the case of ATOM_ARG_IMM, the code cannot reach the default case.
So there is no need for "break".

Signed-off-by: Jesse Zhang 
Suggested-by: Tim Huang 
---
 drivers/gpu/drm/amd/amdgpu/atom.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c 
b/drivers/gpu/drm/amd/amdgpu/atom.c
index d552e013354c..09715b506468 100644
--- a/drivers/gpu/drm/amd/amdgpu/atom.c
+++ b/drivers/gpu/drm/amd/amdgpu/atom.c
@@ -301,7 +301,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, 
uint8_t attr,
(*ptr) += 4;
if (print)
DEBUG("IMM 0x%08X\n", val);
-   return val;
+   break;
case ATOM_SRC_WORD0:
case ATOM_SRC_WORD8:
case ATOM_SRC_WORD16:
@@ -309,7 +309,7 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, 
uint8_t attr,
(*ptr) += 2;
if (print)
DEBUG("IMM 0x%04X\n", val);
-   return val;
+   break;
case ATOM_SRC_BYTE0:
case ATOM_SRC_BYTE8:
case ATOM_SRC_BYTE16:
@@ -318,9 +318,9 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, 
uint8_t attr,
(*ptr)++;
if (print)
DEBUG("IMM 0x%02X\n", val);
-   return val;
+   break;
}
-   break;
+   return val;
case ATOM_ARG_PLL:
idx = U8(*ptr);
(*ptr)++;
-- 
2.25.1



[PATCH 1/12 V2] drm/amd/pm: remove dead code in si_convert_power_level_to_smc

2024-06-05 Thread Jesse Zhang
Since gmc_pg is false, setting mcFlags with SISLANDS_SMC_MC_PG_EN  cannot be 
reach.

Signed-off-by: Jesse Zhang 
Suggested-by: Tim Huang 
---
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c 
b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index 68ac01a8bc3a..f324a8ef8032 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -5467,7 +5467,6 @@ static int si_convert_power_level_to_smc(struct 
amdgpu_device *adev,
int ret;
bool dll_state_on;
u16 std_vddc;
-   bool gmc_pg = false;
 
if (eg_pi->pcie_performance_request &&
(si_pi->force_pcie_gen != SI_PCIE_GEN_INVALID))
@@ -5487,9 +5486,6 @@ static int si_convert_power_level_to_smc(struct 
amdgpu_device *adev,
(RREG32(DPG_PIPE_STUTTER_CONTROL) & STUTTER_ENABLE) &&
(adev->pm.dpm.new_active_crtc_count <= 2)) {
level->mcFlags |= SISLANDS_SMC_MC_STUTTER_EN;
-
-   if (gmc_pg)
-   level->mcFlags |= SISLANDS_SMC_MC_PG_EN;
}
 
if (adev->gmc.vram_type == AMDGPU_VRAM_TYPE_GDDR5) {
-- 
2.25.1



[PATCH 12/12] drm/amdgpu: remove dead code in si_program_aspm

2024-06-03 Thread Jesse Zhang
The variable disable_l1 is false and execution cannot reach the else branch.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/si.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index 85235470e872..d80eec275090 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -2598,9 +2598,6 @@ static void si_program_aspm(struct amdgpu_device *adev)
WREG32(SPLL_CNTL_MODE, data);
}
}
-   } else {
-   if (orig != data)
-   WREG32_PCIE_PORT(PCIE_LC_CNTL, data);
}
 
orig = data = RREG32_PCIE(PCIE_CNTL2);
-- 
2.25.1



[PATCH 11/12] drm/amdkfd: remove logically dead code

2024-06-03 Thread Jesse Zhang
idr_for_each_entry can ensure that mem is not empty during the loop.
So don't need check mem again.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index fdf171ad4a3c..32e5db509560 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1913,11 +1913,6 @@ static int criu_checkpoint_bos(struct kfd_process *p,
struct kfd_criu_bo_priv_data *bo_priv;
int i, dev_idx = 0;
 
-   if (!mem) {
-   ret = -ENOMEM;
-   goto exit;
-   }
-
kgd_mem = (struct kgd_mem *)mem;
dumper_bo = kgd_mem->bo;
 
-- 
2.25.1



[PATCH 10/12] drm/amdkfd: remove dead code in kq_initialize

2024-06-03 Thread Jesse Zhang
The queue type can only be KFD_QUEUE_TYPE_DIQ or KFD_QUEUE_TYPE_HIQ,
and the default cannot be reached.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
index 32c926986dbb..3142b2593e2b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -67,9 +67,6 @@ static bool kq_initialize(struct kernel_queue *kq, struct 
kfd_node *dev,
case KFD_QUEUE_TYPE_HIQ:
kq->mqd_mgr = dev->dqm->mqd_mgrs[KFD_MQD_TYPE_HIQ];
break;
-   default:
-   pr_err("Invalid queue type %d\n", type);
-   return false;
}
 
if (!kq->mqd_mgr)
-- 
2.25.1



[PATCH 09/12] [PATCH 28/28] drm/amdgpu: remove dead code in cik_program_aspm

2024-06-03 Thread Jesse Zhang
Since disable_l1 is false, the else branch cannot be reached.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/cik.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
index 5428fd4071b8..0ad736e775db 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik.c
@@ -1819,9 +1819,6 @@ static void cik_program_aspm(struct amdgpu_device *adev)
WREG32_SMC(ixMPLL_BYPASSCLK_SEL, data);
}
}
-   } else {
-   if (orig != data)
-   WREG32_PCIE(ixPCIE_LC_CNTL, data);
}
 
orig = data = RREG32_PCIE(ixPCIE_CNTL2);
-- 
2.25.1



[PATCH 08/12] drm/amdgpu/pm: remove dead code in aldebaran_emit_clk_levels and arcturus_emit_clk_levels

2024-06-03 Thread Jesse Zhang
The value of type check at the start.
The switch governing value type cannot reach the default case.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c  | 2 --
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
index c0f6b59369b7..f31cf8ad025f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
@@ -937,8 +937,6 @@ static int arcturus_emit_clk_levels(struct smu_context *smu,
smu->smu_table.boot_values.lclk / 100);
break;
 
-   default:
-   return -EINVAL;
}
 
return 0;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 825786fc849e..35eadd7906ca 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -919,8 +919,6 @@ static int aldebaran_emit_clk_levels(struct smu_context 
*smu,
(freq_match) ? "*" : "");
}
break;
-   default:
-   return -EINVAL;
}
 
return 0;
-- 
2.25.1



[PATCH 07/12] drm/amdgpu: remove dead code in amdgpu_vpe_configure_dpm

2024-06-03 Thread Jesse Zhang
When switching on idx, the value of idx must be between 0 and 3.
The switch governing value idx cannot reach the default case.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
index 49881073ff58..fb1902ba0c80 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
@@ -183,8 +183,6 @@ int amdgpu_vpe_configure_dpm(struct amdgpu_vpe *vpe)
case 3:
pratio_vmax_freq = min_freq;
break;
-   default:
-   break;
}
}
 
-- 
2.25.1



[PATCH 06/12] drm/amd/pm: remove dead code in smu_get_power_limit

2024-06-03 Thread Jesse Zhang
At the start it checks limit_level.
When switching on limit_level, the value of limit_level must be between -1 and 
2.
The switch governing value limit_level cannot reach the default case.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 6f742d88867d..0b4193639e65 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -2754,8 +2754,6 @@ int smu_get_power_limit(void *handle,
case SMU_PPT_LIMIT_MIN:
*limit = smu->min_power_limit;
break;
-   default:
-   return -EINVAL;
}
}
 
-- 
2.25.1



[PATCH 05/12] drm/amd/pm: remove dead code in navi10_emit_clk_levels and navi10_print_clk_levels

2024-06-03 Thread Jesse Zhang
Since the range of the varibable i is 0 - 3.
So execution cannot reach this statement: default.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index cf556f1b5ed1..076620fa3ef5 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -1389,8 +1389,6 @@ static int navi10_emit_clk_levels(struct smu_context *smu,
case 2:
curve_settings = _table->GfxclkFreq3;
break;
-   default:
-   break;
}
*offset += sysfs_emit_at(buf, *offset, "%d: %uMHz 
%umV\n",
  i, curve_settings[0],
@@ -1594,8 +1592,6 @@ static int navi10_print_clk_levels(struct smu_context 
*smu,
case 2:
curve_settings = _table->GfxclkFreq3;
break;
-   default:
-   break;
}
size += sysfs_emit_at(buf, size, "%d: %uMHz %umV\n",
  i, curve_settings[0],
-- 
2.25.1



[PATCH 04/12] drm/amdgpu: remove dead code in atom_get_src_int

2024-06-03 Thread Jesse Zhang
Since the range of align is 0~7, the expression is: align = (attr >> 3) & 7.
In the case of ATOM_ARG_IMM, the code cannot reach the default case.
So there is no need for "break".

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/atom.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c 
b/drivers/gpu/drm/amd/amdgpu/atom.c
index d552e013354c..c660e4a663ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/atom.c
+++ b/drivers/gpu/drm/amd/amdgpu/atom.c
@@ -320,7 +320,6 @@ static uint32_t atom_get_src_int(atom_exec_context *ctx, 
uint8_t attr,
DEBUG("IMM 0x%02X\n", val);
return val;
}
-   break;
case ATOM_ARG_PLL:
idx = U8(*ptr);
(*ptr)++;
-- 
2.25.1



[PATCH 03/12] drm/amdgpu: remove dead code in sdma_v6_0_load_microcode

2024-06-03 Thread Jesse Zhang
Remove legacy method to load firmware mode, since that code cannot be reach.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 39 --
 1 file changed, 39 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index c833b6b8373b..b54b9cc2bf75 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -631,45 +631,6 @@ static int sdma_v6_0_load_microcode(struct amdgpu_device 
*adev)
msleep(1);
WREG32(sdma_v6_0_get_reg_offset(adev, 0, 
regSDMA0_BROADCAST_UCODE_DATA), le32_to_cpup(fw_data++));
}
-   } else {
-   dev_info(adev->dev, "Use legacy method to load SDMA 
firmware\n");
-   for (i = 0; i < adev->sdma.num_instances; i++) {
-   /* load Control Thread microcode */
-   hdr = (const struct sdma_firmware_header_v2_0 
*)adev->sdma.instance[0].fw->data;
-   amdgpu_ucode_print_sdma_hdr(>header);
-   fw_size = le32_to_cpu(hdr->ctx_jt_offset + 
hdr->ctx_jt_size) / 4;
-
-   fw_data = (const __le32 *)
-   (adev->sdma.instance[0].fw->data +
-   
le32_to_cpu(hdr->header.ucode_array_offset_bytes));
-
-   WREG32(sdma_v6_0_get_reg_offset(adev, i, 
regSDMA0_UCODE_ADDR), 0);
-
-   for (j = 0; j < fw_size; j++) {
-   if (amdgpu_emu_mode == 1 && j % 500 == 0)
-   msleep(1);
-   WREG32(sdma_v6_0_get_reg_offset(adev, i, 
regSDMA0_UCODE_DATA), le32_to_cpup(fw_data++));
-   }
-
-   WREG32(sdma_v6_0_get_reg_offset(adev, i, 
regSDMA0_UCODE_ADDR), adev->sdma.instance[0].fw_version);
-
-   /* load Context Switch microcode */
-   fw_size = le32_to_cpu(hdr->ctl_jt_offset + 
hdr->ctl_jt_size) / 4;
-
-   fw_data = (const __le32 *)
-   (adev->sdma.instance[0].fw->data +
-   le32_to_cpu(hdr->ctl_ucode_offset));
-
-   WREG32(sdma_v6_0_get_reg_offset(adev, i, 
regSDMA0_UCODE_ADDR), 0x8000);
-
-   for (j = 0; j < fw_size; j++) {
-   if (amdgpu_emu_mode == 1 && j % 500 == 0)
-   msleep(1);
-   WREG32(sdma_v6_0_get_reg_offset(adev, i, 
regSDMA0_UCODE_DATA), le32_to_cpup(fw_data++));
-   }
-
-   WREG32(sdma_v6_0_get_reg_offset(adev, i, 
regSDMA0_UCODE_ADDR), adev->sdma.instance[0].fw_version);
-   }
}
 
return 0;
-- 
2.25.1



[PATCH 02/12] drm/amdgpu: remove dead code in cik_program_aspm

2024-06-03 Thread Jesse Zhang
Since disable_clkreq is false, execution cannot reach this statement: 
clk_req_support = false.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/cik.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
index cf1d5d462b67..5428fd4071b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik.c
@@ -1777,8 +1777,6 @@ static void cik_program_aspm(struct amdgpu_device *adev)
pcie_capability_read_dword(root, 
PCI_EXP_LNKCAP, );
if (lnkcap & PCI_EXP_LNKCAP_CLKPM)
clk_req_support = true;
-   } else {
-   clk_req_support = false;
}
 
if (clk_req_support) {
-- 
2.25.1



[PATCH 01/12] drm/amd/pm: remove dead code in si_convert_power_level_to_smc

2024-06-03 Thread Jesse Zhang
Since gmc_pg is false, setting mcFlags with SISLANDS_SMC_MC_PG_EN  cannot be 
reach.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c 
b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index 68ac01a8bc3a..a18f75a6d480 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -5487,9 +5487,6 @@ static int si_convert_power_level_to_smc(struct 
amdgpu_device *adev,
(RREG32(DPG_PIPE_STUTTER_CONTROL) & STUTTER_ENABLE) &&
(adev->pm.dpm.new_active_crtc_count <= 2)) {
level->mcFlags |= SISLANDS_SMC_MC_STUTTER_EN;
-
-   if (gmc_pg)
-   level->mcFlags |= SISLANDS_SMC_MC_PG_EN;
}
 
if (adev->gmc.vram_type == AMDGPU_VRAM_TYPE_GDDR5) {
-- 
2.25.1



[PATCH 00/12] *** Remove dead code ***

2024-06-03 Thread Jesse Zhang


Jesse Zhang (12):
  drm/amd/pm: remove dead code in si_convert_power_level_to_smc
  drm/amdgpu: remove dead code in cik_program_aspm
  drm/amdgpu: remove dead code in sdma_v6_0_load_microcode
  drm/amdgpu: remove dead code in atom_get_src_int
  drm/amd/pm: remove dead code in navi10_emit_clk_levels and
navi10_print_clk_levels
  drm/amd/pm: remove dead code in smu_get_power_limit
  drm/amdgpu: remove dead code in amdgpu_vpe_configure_dpm
  drm/amdgpu/pm: remove dead code in aldebaran_emit_clk_levels and
arcturus_emit_clk_levels
  drm/amdgpu: remove dead code in cik_program_aspm
  drm/amdkfd: remove dead code in kq_initialize
  drm/amdkfd: remove logically dead code
  drm/amdgpu: remove dead code in si_program_aspm

 drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c   |  2 -
 drivers/gpu/drm/amd/amdgpu/atom.c |  1 -
 drivers/gpu/drm/amd/amdgpu/cik.c  |  5 ---
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c| 39 ---
 drivers/gpu/drm/amd/amdgpu/si.c   |  3 --
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |  5 ---
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c |  3 --
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c|  3 --
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |  2 -
 .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c |  2 -
 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   |  4 --
 .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c|  2 -
 12 files changed, 71 deletions(-)

-- 
2.25.1



[PATCH 7/8] drm/amdkfd: Comment out the unused variable use_static in pm_map_queues_v9

2024-05-30 Thread Jesse Zhang
To fix the warning about unused value,
remove the use_static and use the parameter is_static directly.

Signed-off-by: Jesse Zhang 
Suggested-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index 8ee2bedd301a..00776f08351c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -213,7 +213,6 @@ static int pm_map_queues_v9(struct packet_manager *pm, 
uint32_t *buffer,
struct queue *q, bool is_static)
 {
struct pm4_mes_map_queues *packet;
-   bool use_static = is_static;
 
packet = (struct pm4_mes_map_queues *)buffer;
memset(buffer, 0, sizeof(struct pm4_mes_map_queues));
@@ -234,7 +233,7 @@ static int pm_map_queues_v9(struct packet_manager *pm, 
uint32_t *buffer,
 
switch (q->properties.type) {
case KFD_QUEUE_TYPE_COMPUTE:
-   if (use_static)
+   if (is_static)
packet->bitfields2.queue_type =
queue_type__mes_map_queues__normal_latency_static_queue_vi;
break;
@@ -244,7 +243,6 @@ static int pm_map_queues_v9(struct packet_manager *pm, 
uint32_t *buffer,
break;
case KFD_QUEUE_TYPE_SDMA:
case KFD_QUEUE_TYPE_SDMA_XGMI:
-   use_static = false; /* no static queues under SDMA */
if (q->properties.sdma_engine_id < 2 &&
!pm_use_ext_eng(q->device->kfd))
packet->bitfields2.engine_sel = 
q->properties.sdma_engine_id +
-- 
2.25.1



[PATCH 2/8] drm/amdkfd: fix the kdf debugger issue

2024-05-30 Thread Jesse Zhang
The expression caps | HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED
and  caps | HSA_CAP_TRAP_DEBUG_PRECISE_ALU_OPERATIONS_SUPPORTED
are always 1/true regardless of the values of its operand.

Fixes: 75de8428c3d632 ("drm/amdkfd: enable single alu ops for gfx12")
Signed-off-by: Jesse Zhang 
Suggested-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 3f27bab7a502..34a282540c7e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -503,13 +503,13 @@ int kfd_dbg_trap_set_flags(struct kfd_process *target, 
uint32_t *flags)

kfd_topology_device_by_id(target->pdds[i]->dev->id);
uint32_t caps = topo_dev->node_props.capability;
 
-   if (!(caps | 
HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED) &&
+   if (!(caps & 
HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED) &&
(*flags & KFD_DBG_TRAP_FLAG_SINGLE_MEM_OP)) {
*flags = prev_flags;
return -EACCES;
}
 
-   if (!(caps | 
HSA_CAP_TRAP_DEBUG_PRECISE_ALU_OPERATIONS_SUPPORTED) &&
+   if (!(caps & 
HSA_CAP_TRAP_DEBUG_PRECISE_ALU_OPERATIONS_SUPPORTED) &&
(*flags & KFD_DBG_TRAP_FLAG_SINGLE_ALU_OP)) {
*flags = prev_flags;
return -EACCES;
-- 
2.25.1



[PATCH 8/8] drm/amdkfd: remove dead code in kfd_create_vcrat_image_gpu

2024-05-29 Thread Jesse Zhang
Since the value of avail_size is at least VCRAT_SIZE_FOR_GPU(16384),
minus struct crat_header(40UL) and struct crat_subtype_compute(40UL) it cannot 
be less than 0.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 71150d503dc7..ead43386a7ef 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -2213,9 +2213,6 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 * Modify length and total_entries as subunits are added.
 */
avail_size -= sizeof(struct crat_header);
-   if (avail_size < 0)
-   return -ENOMEM;
-
memset(crat_table, 0, sizeof(struct crat_header));
 
memcpy(_table->signature, CRAT_SIGNATURE,
@@ -2229,9 +2226,6 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 * First fill in the sub type header and then sub type data
 */
avail_size -= sizeof(struct crat_subtype_computeunit);
-   if (avail_size < 0)
-   return -ENOMEM;
-
sub_type_hdr = (struct crat_subtype_generic *)(crat_table + 1);
memset(sub_type_hdr, 0, sizeof(struct crat_subtype_computeunit));
 
-- 
2.25.1



[PATCH 7/8] drm/amdkfd: Comment out the unused variable use_static in pm_map_queues_v9

2024-05-29 Thread Jesse Zhang
To fix the warning about unused value, comment out the variable use_static.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index 8ee2bedd301a..c09476273f73 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -244,7 +244,7 @@ static int pm_map_queues_v9(struct packet_manager *pm, 
uint32_t *buffer,
break;
case KFD_QUEUE_TYPE_SDMA:
case KFD_QUEUE_TYPE_SDMA_XGMI:
-   use_static = false; /* no static queues under SDMA */
+   //use_static = false; /* no static queues under SDMA */
if (q->properties.sdma_engine_id < 2 &&
!pm_use_ext_eng(q->device->kfd))
packet->bitfields2.engine_sel = 
q->properties.sdma_engine_id +
-- 
2.25.1



[PATCH 6/8] drm/amdkfd: remove dead code in the function svm_range_get_pte_flags

2024-05-29 Thread Jesse Zhang
The varible uncached  set false, the condition uncached cannot be true.
So remove the dead code, mapping flags will set the flag AMDGPU_VM_MTYPE_UC in 
else.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 407636a68814..bd9c2921e0dc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1171,7 +1171,6 @@ svm_range_get_pte_flags(struct kfd_node *node,
bool snoop = (domain != SVM_RANGE_VRAM_DOMAIN);
bool coherent = flags & (KFD_IOCTL_SVM_FLAG_COHERENT | 
KFD_IOCTL_SVM_FLAG_EXT_COHERENT);
bool ext_coherent = flags & KFD_IOCTL_SVM_FLAG_EXT_COHERENT;
-   bool uncached = false; /*flags & KFD_IOCTL_SVM_FLAG_UNCACHED;*/
unsigned int mtype_local;
 
if (domain == SVM_RANGE_VRAM_DOMAIN)
@@ -1220,9 +1219,7 @@ svm_range_get_pte_flags(struct kfd_node *node,
mtype_local = amdgpu_mtype_local == 1 ? 
AMDGPU_VM_MTYPE_NC :
amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_CC : 
AMDGPU_VM_MTYPE_RW;
snoop = true;
-   if (uncached) {
-   mapping_flags |= AMDGPU_VM_MTYPE_UC;
-   } else if (domain == SVM_RANGE_VRAM_DOMAIN) {
+   if (domain == SVM_RANGE_VRAM_DOMAIN) {
/* local HBM region close to partition */
if (bo_node->adev == node->adev &&
(!bo_node->xcp || !node->xcp || 
bo_node->xcp->mem_id == node->xcp->mem_id))
-- 
2.25.1



[PATCH 6/8] drm/amdkfd: remove dead code in the function svm_range_get_pte_flags

2024-05-29 Thread Jesse Zhang
The varible uncached  set false, the condition uncached cannot be true.
So remove the dead code, mapping flags will set the flag AMDGPU_VM_MTYPE_UC in 
else.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 407636a68814..bd9c2921e0dc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1171,7 +1171,6 @@ svm_range_get_pte_flags(struct kfd_node *node,
bool snoop = (domain != SVM_RANGE_VRAM_DOMAIN);
bool coherent = flags & (KFD_IOCTL_SVM_FLAG_COHERENT | 
KFD_IOCTL_SVM_FLAG_EXT_COHERENT);
bool ext_coherent = flags & KFD_IOCTL_SVM_FLAG_EXT_COHERENT;
-   bool uncached = false; /*flags & KFD_IOCTL_SVM_FLAG_UNCACHED;*/
unsigned int mtype_local;
 
if (domain == SVM_RANGE_VRAM_DOMAIN)
@@ -1220,9 +1219,7 @@ svm_range_get_pte_flags(struct kfd_node *node,
mtype_local = amdgpu_mtype_local == 1 ? 
AMDGPU_VM_MTYPE_NC :
amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_CC : 
AMDGPU_VM_MTYPE_RW;
snoop = true;
-   if (uncached) {
-   mapping_flags |= AMDGPU_VM_MTYPE_UC;
-   } else if (domain == SVM_RANGE_VRAM_DOMAIN) {
+   if (domain == SVM_RANGE_VRAM_DOMAIN) {
/* local HBM region close to partition */
if (bo_node->adev == node->adev &&
(!bo_node->xcp || !node->xcp || 
bo_node->xcp->mem_id == node->xcp->mem_id))
-- 
2.25.1



[PATCH 5/8] drm/amdkfd: fix the return for the function kfd_dbg_trap_set_flags

2024-05-29 Thread Jesse Zhang
If the rewind flag is set, it should return the final result of
setting mes debug mode or refresh the run list.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 4abd275056d6..d12e5f29919a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -548,9 +548,9 @@ int kfd_dbg_trap_set_flags(struct kfd_process *target, 
uint32_t *flags)
continue;
 
if (!pdd->dev->kfd->shared_resources.enable_mes)
-   debug_refresh_runlist(pdd->dev->dqm);
+   r = debug_refresh_runlist(pdd->dev->dqm);
else
-   kfd_dbg_set_mes_debug_mode(pdd, true);
+   r = kfd_dbg_set_mes_debug_mode(pdd, true);
}
}
 
-- 
2.25.1



[PATCH 4/8] amd/amdkfd:fix overflowed constant in the function svm_migrate_copy_to_ram

2024-05-29 Thread Jesse Zhang
If the svm migration copy memory gart fails or the dma mapping page fails for 
the first time.
But the variable i is still 0, and executing i-- will overflow.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 8ee3d07ffbdf..3620eabf13c7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -650,9 +650,10 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
 out_oom:
if (r) {
pr_debug("failed %d copy to ram\n", r);
-   while (i--) {
+   while (i) {
svm_migrate_put_sys_page(dst[i]);
migrate->dst[i] = 0;
+   i--;
}
}
 
-- 
2.25.1



[PATCH 3/8] drm/amdkfd: fix overflow for the function criu_restore_bos

2024-05-29 Thread Jesse Zhang
When copying the information from the user fails, it will goto exit.
But the variable i remains at 0, and do i-- will overflow.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index fdf171ad4a3c..dac8fdc49e3b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2480,10 +2480,11 @@ static int criu_restore_bos(struct kfd_process *p,
ret = -EFAULT;
 
 exit:
-   while (ret && i--) {
+   while (ret && i) {
if (bo_buckets[i].alloc_flags
   & (KFD_IOC_ALLOC_MEM_FLAGS_VRAM | 
KFD_IOC_ALLOC_MEM_FLAGS_GTT))
close_fd(bo_buckets[i].dmabuf_fd);
+   i--;
}
kvfree(bo_buckets);
kvfree(bo_privs);
-- 
2.25.1



[PATCH 2/8] drm/amdkfd: fix the kdf debugger issue

2024-05-29 Thread Jesse Zhang
the expression caps | HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED
is always 1/true regardless of the values of its operand.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
index 3f27bab7a502..4abd275056d6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
@@ -503,7 +503,7 @@ int kfd_dbg_trap_set_flags(struct kfd_process *target, 
uint32_t *flags)

kfd_topology_device_by_id(target->pdds[i]->dev->id);
uint32_t caps = topo_dev->node_props.capability;
 
-   if (!(caps | 
HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED) &&
+   if (!(caps & 
HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED) &&
(*flags & KFD_DBG_TRAP_FLAG_SINGLE_MEM_OP)) {
*flags = prev_flags;
return -EACCES;
-- 
2.25.1



[PATCH 1/8] drm/amdgu: fix Unintentional integer overflow for mall size

2024-05-29 Thread Jesse Zhang
Potentially overflowing expression mall_size_per_umc * adev->gmc.num_umc with 
type unsigned int (32 bits, unsigned)
is evaluated using 32-bit arithmetic,and then used in a context that expects an 
expression of type u64 (64 bits, unsigned).

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 98e8f30824c3..9e0cfe06c8b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -1639,7 +1639,7 @@ static int amdgpu_discovery_get_mall_info(struct 
amdgpu_device *adev)
break;
case 2:
mall_size_per_umc = 
le32_to_cpu(mall_info->v2.mall_size_per_umc);
-   adev->gmc.mall_size = mall_size_per_umc * adev->gmc.num_umc;
+   adev->gmc.mall_size = (uint64_t)mall_size_per_umc * 
adev->gmc.num_umc;
break;
default:
dev_err(adev->dev,
-- 
2.25.1



[PATCH V2] drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent

2024-05-23 Thread Jesse Zhang
The pointer parent may be NULLed by the function amdgpu_vm_pt_parent.
To make the code more robust, check the pointer parent.

Signed-off-by: Jesse Zhang 
Suggested-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 0763382d305a..e39d6e7643bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -706,11 +706,15 @@ int amdgpu_vm_pde_update(struct amdgpu_vm_update_params 
*params,
 struct amdgpu_vm_bo_base *entry)
 {
struct amdgpu_vm_bo_base *parent = amdgpu_vm_pt_parent(entry);
-   struct amdgpu_bo *bo = parent->bo, *pbo;
+   struct amdgpu_bo *bo, *pbo;
struct amdgpu_vm *vm = params->vm;
uint64_t pde, pt, flags;
unsigned int level;
 
+   if (WARN_ON(!parent))
+   return -EINVAL;
+
+   bo = parent->bo;
for (level = 0, pbo = bo->parent; pbo; ++level)
pbo = pbo->parent;
 
-- 
2.25.1



[PATCH V2] drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent

2024-05-23 Thread Jesse Zhang
The pointer parent may be NULLed by the function amdgpu_vm_pt_parent.
To make the code more robust, check the pointer parent.

V2: When parent is NULL here we should
 probably call BUG() instead. (Christian)

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 0763382d305a..6fac8440012e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -706,11 +706,17 @@ int amdgpu_vm_pde_update(struct amdgpu_vm_update_params 
*params,
 struct amdgpu_vm_bo_base *entry)
 {
struct amdgpu_vm_bo_base *parent = amdgpu_vm_pt_parent(entry);
-   struct amdgpu_bo *bo = parent->bo, *pbo;
+   struct amdgpu_bo *bo, *pbo;
struct amdgpu_vm *vm = params->vm;
uint64_t pde, pt, flags;
unsigned int level;
 
+   if (!parent) {
+   BUG();
+   return -EINVAL;
+   }
+   bo = parent->bo;
+
for (level = 0, pbo = bo->parent; pbo; ++level)
pbo = pbo->parent;
 
-- 
2.25.1



[PATCH] drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent

2024-05-23 Thread Jesse Zhang
The pointer parent may be NULLed by the function amdgpu_vm_pt_parent.
To make the code more robust, check the pointer parent.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 0763382d305a..bad8d2c31202 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -706,11 +706,15 @@ int amdgpu_vm_pde_update(struct amdgpu_vm_update_params 
*params,
 struct amdgpu_vm_bo_base *entry)
 {
struct amdgpu_vm_bo_base *parent = amdgpu_vm_pt_parent(entry);
-   struct amdgpu_bo *bo = parent->bo, *pbo;
+   struct amdgpu_bo *bo, *pbo;
struct amdgpu_vm *vm = params->vm;
uint64_t pde, pt, flags;
unsigned int level;
 
+   if (!parent)
+   return -EINVAL;
+   bo = parent->bo;
+
for (level = 0, pbo = bo->parent; pbo; ++level)
pbo = pbo->parent;
 
-- 
2.25.1



[PATCH 4/4 V2] drm/admgpu: fix dereferencing null pointer context

2024-05-21 Thread Jesse Zhang
When user space sets an invalid ta type, the pointer context will be empty.
So it need to check the pointer context before using it

Signed-off-by: Jesse Zhang 
Suggested-by: Tim Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
index ca5c86e5f7cd..8e8afbd237bc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
@@ -334,7 +334,7 @@ static ssize_t ta_if_invoke_debugfs_write(struct file *fp, 
const char *buf, size
 
set_ta_context_funcs(psp, ta_type, );
 
-   if (!context->initialized) {
+   if (!context || !context->initialized) {
dev_err(adev->dev, "TA is not initialized\n");
ret = -EINVAL;
goto err_free_shared_buf;
-- 
2.25.1



[PATCH 3/4 V3] drm/amdgpu: fix invadate operation for pg_flags

2024-05-21 Thread Jesse Zhang
Since the type of pg_flags is u32, adev->pg_flags >> 16 >> 16 is 0
regardless of the values of its operands.

So removing the operations upper_32_bits and lower_32_bits.

Signed-off-by: Jesse Zhang 
Suggested-by: Tim Huang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index ac0ba8b8c1aa..0e1a11b6b989 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -918,7 +918,7 @@ static ssize_t amdgpu_debugfs_gca_config_read(struct file 
*f, char __user *buf,
 
/* rev==1 */
config[no_regs++] = adev->rev_id;
-   config[no_regs++] = lower_32_bits(adev->pg_flags);
+   config[no_regs++] = adev->pg_flags;
config[no_regs++] = lower_32_bits(adev->cg_flags);
 
/* rev==2 */
@@ -935,7 +935,7 @@ static ssize_t amdgpu_debugfs_gca_config_read(struct file 
*f, char __user *buf,
config[no_regs++] = adev->flags & AMD_IS_APU ? 1 : 0;
 
/* rev==5 PG/CG flag upper 32bit */
-   config[no_regs++] = upper_32_bits(adev->pg_flags);
+   config[no_regs++] = 0;
config[no_regs++] = upper_32_bits(adev->cg_flags);
 
while (size && (*pos < no_regs * 4)) {
-- 
2.25.1



[PATCH 2/4 V2] drm/amd/pm: fix unsigned value asic_type compared against

2024-05-21 Thread Jesse Zhang
Enum asic_type always greater than or equal CHIP_TAHITI.

Signed-off-by: Jesse Zhang 
Suggested-by: Tim Huang 
---
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c 
b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index f245fc0bc6d3..68ac01a8bc3a 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -7928,12 +7928,8 @@ static void si_dpm_print_power_state(void *handle,
DRM_INFO("\tuvdvclk: %d dclk: %d\n", rps->vclk, rps->dclk);
for (i = 0; i < ps->performance_level_count; i++) {
pl = >performance_levels[i];
-   if (adev->asic_type >= CHIP_TAHITI)
-   DRM_INFO("\t\tpower level %dsclk: %u mclk: %u vddc: 
%u vddci: %u pcie gen: %u\n",
-i, pl->sclk, pl->mclk, pl->vddc, pl->vddci, 
pl->pcie_gen + 1);
-   else
-   DRM_INFO("\t\tpower level %dsclk: %u mclk: %u vddc: 
%u vddci: %u\n",
-i, pl->sclk, pl->mclk, pl->vddc, pl->vddci);
+   DRM_INFO("\t\tpower level %dsclk: %u mclk: %u vddc: %u 
vddci: %u pcie gen: %u\n",
+i, pl->sclk, pl->mclk, pl->vddc, pl->vddci, 
pl->pcie_gen + 1);
}
amdgpu_dpm_print_ps_status(adev, rps);
 }
-- 
2.25.1



[PATCH 1/4 V2] drm/amdgpu: fix invadate operation for umsch

2024-05-21 Thread Jesse Zhang
Since the type of data_size is uint32_t, adev->umsch_mm.data_size - 1 >> 16 >> 
16 is 0
regardless of the values of its operands

So removing the operations upper_32_bits and lower_32_bits.

Signed-off-by: Jesse Zhang 
Suggested-by: Tim Huang 
---
 drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
index 2c5e7b0a73f9..ce3bb12e3572 100644
--- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
@@ -116,9 +116,8 @@ static int umsch_mm_v4_0_load_microcode(struct 
amdgpu_umsch_mm *umsch)
upper_32_bits(adev->umsch_mm.data_start_addr));
 
WREG32_SOC15_UMSCH(regVCN_MES_LOCAL_MASK0_LO,
-   lower_32_bits(adev->umsch_mm.data_size - 1));
-   WREG32_SOC15_UMSCH(regVCN_MES_LOCAL_MASK0_HI,
-   upper_32_bits(adev->umsch_mm.data_size - 1));
+   adev->umsch_mm.data_size - 1);
+   WREG32_SOC15_UMSCH(regVCN_MES_LOCAL_MASK0_HI, 0);
 
data = adev->firmware.load_type == AMDGPU_FW_LOAD_PSP ?
   0 : adev->umsch_mm.data_fw_gpu_addr;
-- 
2.25.1



[PATCH 4/4] drm/admgpu: fix dereferencing null pointer context

2024-05-20 Thread Jesse Zhang
When user space sets an invalid ta type, the pointer context will be empty.
So it need to check the pointer context before using it

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
index ca5c86e5f7cd..ac1f423dd28f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
@@ -334,7 +334,7 @@ static ssize_t ta_if_invoke_debugfs_write(struct file *fp, 
const char *buf, size
 
set_ta_context_funcs(psp, ta_type, );
 
-   if (!context->initialized) {
+   if (context && !context->initialized) {
dev_err(adev->dev, "TA is not initialized\n");
ret = -EINVAL;
goto err_free_shared_buf;
-- 
2.25.1



[PATCH 3/4] drm/amdgpu: fix invadate operation for pg_flags

2024-05-20 Thread Jesse Zhang
adev->pg_flags >> 16 >> 16 is 0 regardless of the values of its operands.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index ac0ba8b8c1aa..da5bbf46eb27 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -935,7 +935,7 @@ static ssize_t amdgpu_debugfs_gca_config_read(struct file 
*f, char __user *buf,
config[no_regs++] = adev->flags & AMD_IS_APU ? 1 : 0;
 
/* rev==5 PG/CG flag upper 32bit */
-   config[no_regs++] = upper_32_bits(adev->pg_flags);
+   config[no_regs++] = 0;
config[no_regs++] = upper_32_bits(adev->cg_flags);
 
while (size && (*pos < no_regs * 4)) {
-- 
2.25.1



[PATCH 2/4] drm/amd/pm: fix unsigned value asic_type compared against 0

2024-05-20 Thread Jesse Zhang
Enum asic_type always greater than or equal CHIP_TAHITI. 

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c 
b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index f245fc0bc6d3..feca6e17113d 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -7928,12 +7928,8 @@ static void si_dpm_print_power_state(void *handle,
DRM_INFO("\tuvdvclk: %d dclk: %d\n", rps->vclk, rps->dclk);
for (i = 0; i < ps->performance_level_count; i++) {
pl = >performance_levels[i];
-   if (adev->asic_type >= CHIP_TAHITI)
DRM_INFO("\t\tpower level %dsclk: %u mclk: %u vddc: 
%u vddci: %u pcie gen: %u\n",
 i, pl->sclk, pl->mclk, pl->vddc, pl->vddci, 
pl->pcie_gen + 1);
-   else
-   DRM_INFO("\t\tpower level %dsclk: %u mclk: %u vddc: 
%u vddci: %u\n",
-i, pl->sclk, pl->mclk, pl->vddc, pl->vddci);
}
amdgpu_dpm_print_ps_status(adev, rps);
 }
-- 
2.25.1



[PATCH 1/4] drm/amdgpu: fix invadate operation for umsch

2024-05-20 Thread Jesse Zhang
adev->umsch_mm.data_size - 1 >> 16 >> 16 is 0 regardless of the values of its 
operands

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
index 2c5e7b0a73f9..880d91a654e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c
@@ -117,8 +117,7 @@ static int umsch_mm_v4_0_load_microcode(struct 
amdgpu_umsch_mm *umsch)
 
WREG32_SOC15_UMSCH(regVCN_MES_LOCAL_MASK0_LO,
lower_32_bits(adev->umsch_mm.data_size - 1));
-   WREG32_SOC15_UMSCH(regVCN_MES_LOCAL_MASK0_HI,
-   upper_32_bits(adev->umsch_mm.data_size - 1));
+   WREG32_SOC15_UMSCH(regVCN_MES_LOCAL_MASK0_HI, 0);
 
data = adev->firmware.load_type == AMDGPU_FW_LOAD_PSP ?
   0 : adev->umsch_mm.data_fw_gpu_addr;
-- 
2.25.1



[PATCH 2/2] drm/amd/pm: check specific index for aldebaran

2024-05-14 Thread Jesse Zhang
To avoid warning problems, drop index and
use PPSMC_MSG_GfxDriverReset instead of index for aldebaran.

Signed-off-by: Jesse Zhang 
Suggested-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index a22eb6bbb05e..2fc4ba036afe 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1880,17 +1880,18 @@ static int aldebaran_mode1_reset(struct smu_context 
*smu)
 
 static int aldebaran_mode2_reset(struct smu_context *smu)
 {
-   int ret = 0, index;
+   int ret = 0;
struct amdgpu_device *adev = smu->adev;
int timeout = 10;
 
-   index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
-   SMU_MSG_GfxDeviceDriverReset);
-   if (index < 0 )
-   return -EINVAL;
mutex_lock(>message_lock);
if (smu->smc_fw_version >= 0x00441400) {
-   ret = smu_cmn_send_msg_without_waiting(smu, (uint16_t)index, 
SMU_RESET_MODE_2);
+   ret = smu_cmn_send_msg_without_waiting(smu, 
PPSMC_MSG_GfxDriverReset,
+   
SMU_RESET_MODE_2);
+   if (ret) {
+   dev_err(smu->adev->dev, "Failed to mode2 reset!\n");
+   goto out;
+   }
/* This is similar to FLR, wait till max FLR timeout */
msleep(100);
dev_dbg(smu->adev->dev, "restore config space...\n");
-- 
2.25.1



[PATCH 1/2] drm/amd/pm: check specific index for smu13

2024-05-14 Thread Jesse Zhang
To avoid warning problems, drop index and use PPSMC_MSG_GfxDriverReset instead 
of index.

Signed-off-by: Jesse Zhang 
Suggested-by: Lijo Lazar 
---
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index 46ab70a244af..6d691edf74fa 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -2330,20 +2330,15 @@ static void smu_v13_0_6_restore_pci_config(struct 
smu_context *smu)
 
 static int smu_v13_0_6_mode2_reset(struct smu_context *smu)
 {
-   int ret = 0, index;
+   int ret = 0;
struct amdgpu_device *adev = smu->adev;
int timeout = 10;
 
-   index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
-  SMU_MSG_GfxDeviceDriverReset);
-   if (index < 0)
-   return index;
-
mutex_lock(>message_lock);
-
-   ret = smu_cmn_send_msg_without_waiting(smu, (uint16_t)index,
-  SMU_RESET_MODE_2);
-
+   ret = smu_cmn_send_msg_without_waiting(smu, PPSMC_MSG_GfxDriverReset,
+   SMU_RESET_MODE_2);
+   if (ret)
+   goto out;
/* Reset takes a bit longer, wait for 200ms. */
msleep(200);
 
-- 
2.25.1



[PATCH 2/2 v2] drm/amd/pm: check specific index for aldebaran

2024-05-14 Thread Jesse Zhang
To avoid warning problems, drop index and
use PPSMC_MSG_GfxDriverReset instead of index for aldebaran.

Signed-off-by: Jesse Zhang 
Suggested-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index a22eb6bbb05e..d671314c46c8 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1880,17 +1880,18 @@ static int aldebaran_mode1_reset(struct smu_context 
*smu)
 
 static int aldebaran_mode2_reset(struct smu_context *smu)
 {
-   int ret = 0, index;
+   int ret = 0;
struct amdgpu_device *adev = smu->adev;
int timeout = 10;
 
-   index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
-   SMU_MSG_GfxDeviceDriverReset);
-   if (index < 0 )
-   return -EINVAL;
mutex_lock(>message_lock);
if (smu->smc_fw_version >= 0x00441400) {
-   ret = smu_cmn_send_msg_without_waiting(smu, (uint16_t)index, 
SMU_RESET_MODE_2);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
PPSMC_MSG_GfxDriverReset,
+   
SMU_RESET_MODE_2, NULL);
+   if (ret) {
+   dev_err(smu->adev->dev, "Failed to mode2 reset!\n");
+   goto out;
+   }
/* This is similar to FLR, wait till max FLR timeout */
msleep(100);
dev_dbg(smu->adev->dev, "restore config space...\n");
-- 
2.25.1



[no subject]

2024-05-14 Thread Jesse Zhang
>From 3348a4bb465834b165de80dc42d11630ac5c6a83 Mon Sep 17 00:00:00 2001
From: Jesse Zhang 
Date: Tue, 14 May 2024 13:59:18 +0800
Subject: [PATCH 2/2 v2] drm/amd/pm: check specific index for aldebaran

To avoid warning problems, drop index and
use PPSMC_MSG_GfxDriverReset instead of index for aldebaran.

Signed-off-by: Jesse Zhang 
Suggested-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index a22eb6bbb05e..d671314c46c8 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1880,17 +1880,18 @@ static int aldebaran_mode1_reset(struct smu_context 
*smu)
 
 static int aldebaran_mode2_reset(struct smu_context *smu)
 {
-   int ret = 0, index;
+   int ret = 0;
struct amdgpu_device *adev = smu->adev;
int timeout = 10;
 
-   index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
-   SMU_MSG_GfxDeviceDriverReset);
-   if (index < 0 )
-   return -EINVAL;
mutex_lock(>message_lock);
if (smu->smc_fw_version >= 0x00441400) {
-   ret = smu_cmn_send_msg_without_waiting(smu, (uint16_t)index, 
SMU_RESET_MODE_2);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
PPSMC_MSG_GfxDriverReset,
+   
SMU_RESET_MODE_2, NULL);
+   if (ret) {
+   dev_err(smu->adev->dev, "Failed to mode2 reset!\n");
+   goto out;
+   }
/* This is similar to FLR, wait till max FLR timeout */
msleep(100);
dev_dbg(smu->adev->dev, "restore config space...\n");
-- 
2.25.1



[PATCH 1/2 v2] drm/amd/pm: check specific index for smu13

2024-05-14 Thread Jesse Zhang
To avoid warning problems, drop index and 
use PPSMC_MSG_GfxDriverReset instead of index for smu13.

Signed-off-by: Jesse Zhang 
Suggested-by: Lijo Lazar 
---
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index 46ab70a244af..27ec95a4e81d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -2330,20 +2330,15 @@ static void smu_v13_0_6_restore_pci_config(struct 
smu_context *smu)
 
 static int smu_v13_0_6_mode2_reset(struct smu_context *smu)
 {
-   int ret = 0, index;
+   int ret = 0;
struct amdgpu_device *adev = smu->adev;
int timeout = 10;
 
-   index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
-  SMU_MSG_GfxDeviceDriverReset);
-   if (index < 0)
-   return index;
-
mutex_lock(>message_lock);
-
-   ret = smu_cmn_send_msg_without_waiting(smu, (uint16_t)index,
-  SMU_RESET_MODE_2);
-
+   ret = smu_cmn_send_smc_msg_with_param(smu, PPSMC_MSG_GfxDriverReset,
+   SMU_RESET_MODE_2, NULL);
+   if (ret)
+   goto out;
/* Reset takes a bit longer, wait for 200ms. */
msleep(200);
 
-- 
2.25.1



[PATCH 2/2] drm/amd/pm: check specific index for aldebaran

2024-05-14 Thread Jesse Zhang
To avoid warning problems, drop index and
use PPSMC_MSG_GfxDriverReset instead of index for aldebaran.

Signed-off-by: Jesse Zhang 
Suggested-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index a22eb6bbb05e..d671314c46c8 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1880,17 +1880,18 @@ static int aldebaran_mode1_reset(struct smu_context 
*smu)
 
 static int aldebaran_mode2_reset(struct smu_context *smu)
 {
-   int ret = 0, index;
+   int ret = 0;
struct amdgpu_device *adev = smu->adev;
int timeout = 10;
 
-   index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
-   SMU_MSG_GfxDeviceDriverReset);
-   if (index < 0 )
-   return -EINVAL;
mutex_lock(>message_lock);
if (smu->smc_fw_version >= 0x00441400) {
-   ret = smu_cmn_send_msg_without_waiting(smu, (uint16_t)index, 
SMU_RESET_MODE_2);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_GfxDeviceDriverReset,
+   
SMU_RESET_MODE_2, NULL);
+   if (ret) {
+   dev_err(smu->adev->dev, "Failed to mode2 reset!\n");
+   goto out;
+   }
/* This is similar to FLR, wait till max FLR timeout */
msleep(100);
dev_dbg(smu->adev->dev, "restore config space...\n");
-- 
2.25.1



[PATCH 1/2] drm/amd/pm: check specific index for smu13

2024-05-14 Thread Jesse Zhang
To avoid warning problems, drop index and 
use PPSMC_MSG_GfxDriverReset instead of index for smu13.

Signed-off-by: Jesse Zhang 
Suggested-by: Lijo Lazar 
---
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index 46ab70a244af..27ec95a4e81d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -2330,20 +2330,15 @@ static void smu_v13_0_6_restore_pci_config(struct 
smu_context *smu)
 
 static int smu_v13_0_6_mode2_reset(struct smu_context *smu)
 {
-   int ret = 0, index;
+   int ret = 0;
struct amdgpu_device *adev = smu->adev;
int timeout = 10;
 
-   index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
-  SMU_MSG_GfxDeviceDriverReset);
-   if (index < 0)
-   return index;
-
mutex_lock(>message_lock);
-
-   ret = smu_cmn_send_msg_without_waiting(smu, (uint16_t)index,
-  SMU_RESET_MODE_2);
-
+   ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GfxDeviceDriverReset,
+   SMU_RESET_MODE_2, NULL);
+   if (ret)
+   goto out;
/* Reset takes a bit longer, wait for 200ms. */
msleep(200);
 
-- 
2.25.1



[PATCH 18/22 V3] drm/amd/pm: check negtive return for table entries

2024-05-13 Thread Jesse Zhang
Function hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr) returns a 
negative number

Signed-off-by: Jesse Zhang 
Suggested-by: Tim Huang 
---
 drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c
index f4bd8e9357e2..18f00038d844 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c
@@ -30,9 +30,8 @@ int psm_init_power_state_table(struct pp_hwmgr *hwmgr)
 {
int result;
unsigned int i;
-   unsigned int table_entries;
struct pp_power_state *state;
-   int size;
+   int size, table_entries;
 
if (hwmgr->hwmgr_func->get_num_of_pp_table_entries == NULL)
return 0;
@@ -40,15 +39,19 @@ int psm_init_power_state_table(struct pp_hwmgr *hwmgr)
if (hwmgr->hwmgr_func->get_power_state_size == NULL)
return 0;
 
-   hwmgr->num_ps = table_entries = 
hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr);
+   table_entries = hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr);
 
-   hwmgr->ps_size = size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) +
+   size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) +
  sizeof(struct pp_power_state);
 
-   if (table_entries == 0 || size == 0) {
+   if (table_entries <= 0 || size == 0) {
pr_warn("Please check whether power state management is 
supported on this asic\n");
+   hwmgr->num_ps = 0;
+   hwmgr->ps_size = 0;
return 0;
}
+   hwmgr->num_ps = table_entries;
+   hwmgr->ps_size = size;
 
hwmgr->ps = kcalloc(table_entries, size, GFP_KERNEL);
if (hwmgr->ps == NULL)
-- 
2.25.1



[PATCH 18/22] drm/amd/pm: check negtive return for table entries

2024-05-13 Thread Jesse Zhang
Function hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr) returns a 
negative number

Signed-off-by: Jesse Zhang 
Suggested-by: Tim Huang 
---
 drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c
index f4bd8e9357e2..1276a95acc90 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c
@@ -30,9 +30,8 @@ int psm_init_power_state_table(struct pp_hwmgr *hwmgr)
 {
int result;
unsigned int i;
-   unsigned int table_entries;
struct pp_power_state *state;
-   int size;
+   int size, table_entries;
 
if (hwmgr->hwmgr_func->get_num_of_pp_table_entries == NULL)
return 0;
@@ -40,15 +39,17 @@ int psm_init_power_state_table(struct pp_hwmgr *hwmgr)
if (hwmgr->hwmgr_func->get_power_state_size == NULL)
return 0;
 
-   hwmgr->num_ps = table_entries = 
hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr);
+   table_entries = hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr);
 
-   hwmgr->ps_size = size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) +
+   size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) +
  sizeof(struct pp_power_state);
 
-   if (table_entries == 0 || size == 0) {
+   if (table_entries <= 0 || size == 0) {
pr_warn("Please check whether power state management is 
supported on this asic\n");
return 0;
}
+   hwmgr->num_ps = table_entries;
+   hwmgr->ps_size = size;
 
hwmgr->ps = kcalloc(table_entries, size, GFP_KERNEL);
if (hwmgr->ps == NULL)
-- 
2.25.1



[PATCH 2/22 V2] drm/amdgpu: the warning dereferencing obj for nbio_v7_4

2024-05-13 Thread Jesse Zhang
if ras_manager obj null, don't print NBIO err data

Signed-off-by: Jesse Zhang 
Suggested-by: Tim Huang 
---
 drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index fe18df10daaa..32cc60ce5521 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -383,7 +383,7 @@ static void 
nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device
else
WREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL, 
bif_doorbell_intr_cntl);
 
-   if (!ras->disable_ras_err_cnt_harvest) {
+   if (ras && !ras->disable_ras_err_cnt_harvest && obj) {
/*
 * clear error status after ras_controller_intr
 * according to hw team and count ue number
-- 
2.25.1



[PATCH 19/22 V2] drm/amdgpu: Fix the warning division or modulo by zero for the variable num_xcc_per_xcp

2024-05-10 Thread Jesse Zhang
Checks the partition mode and returns an error for an invalid mode.

Signed-off-by: Jesse Zhang 
Suggested-by:  Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c 
b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
index 414ea3f560a7..b1c18b7a38ad 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
@@ -501,6 +501,12 @@ static int aqua_vanjaram_switch_partition_mode(struct 
amdgpu_xcp_mgr *xcp_mgr,
 
if (mode == AMDGPU_AUTO_COMPUTE_PARTITION_MODE) {
mode = __aqua_vanjaram_get_auto_mode(xcp_mgr);
+   if (mode == AMDGPU_UNKNOWN_COMPUTE_PARTITION_MODE) {
+   dev_err(adev->dev,
+   "Invalid compute partition mode requested, 
requested: %s, available memory partitions: %d",
+   amdgpu_gfx_compute_mode_desc(mode), 
adev->gmc.num_mem_partitions);
+   return -EINVAL;
+   }
} else if (!__aqua_vanjaram_is_valid_mode(xcp_mgr, mode)) {
dev_err(adev->dev,
"Invalid compute partition mode requested, requested: 
%s, available memory partitions: %d",
@@ -522,6 +528,7 @@ static int aqua_vanjaram_switch_partition_mode(struct 
amdgpu_xcp_mgr *xcp_mgr,
goto unlock;
 
num_xcc_per_xcp = __aqua_vanjaram_get_xcc_per_xcp(xcp_mgr, mode);
if (adev->gfx.funcs->switch_partition_mode)
adev->gfx.funcs->switch_partition_mode(xcp_mgr->adev,
   num_xcc_per_xcp);
-- 
2.25.1



[PATCH 09/22 V2] drm/amd/pm: check specific index for smu13

2024-05-10 Thread Jesse Zhang
Check for specific indexes that may be invalid values.

Signed-off-by: Jesse Zhang 
Suggested-by: Tim Huang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index 051092f1b1b4..7c343dd12a7f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -2336,6 +2336,8 @@ static int smu_v13_0_6_mode2_reset(struct smu_context 
*smu)
 
index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
   SMU_MSG_GfxDeviceDriverReset);
+   if (index < 0)
+   return index;
 
mutex_lock(>message_lock);
 
-- 
2.25.1



[PATCH 22/22] drm/amdgpu: clear the warning unsigned compared against 0 for xcp_id

2024-05-09 Thread Jesse Zhang
This greater-than-or-equal-to-zero comparison of an unsigned value is always 
true. fpriv->xcp_id >= 0U

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 977cde6d1362..66782be5917b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -618,7 +618,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
return -EINVAL;
 
if (adev->xcp_mgr && adev->xcp_mgr->num_xcps > 0 &&
-   fpriv->xcp_id >= 0 && fpriv->xcp_id < 
adev->xcp_mgr->num_xcps) {
+   fpriv->xcp_id < adev->xcp_mgr->num_xcps) {
xcp = >xcp_mgr->xcp[fpriv->xcp_id];
switch (type) {
case AMD_IP_BLOCK_TYPE_GFX:
-- 
2.25.1



[PATCH 21/22] drm/amd/pm: fix get dpm level count for yello carp

2024-05-09 Thread Jesse Zhang
For invalid clk types, return -EINVAL to check the return.

 Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
index 5917c88cc87d..260c339f89c5 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/yellow_carp_ppt.c
@@ -777,7 +777,7 @@ static int yellow_carp_get_dpm_level_count(struct 
smu_context *smu,
*count = clk_table->NumDfPstatesEnabled;
break;
default:
-   break;
+   return -EINVAL;
}
 
return 0;
-- 
2.25.1



[PATCH 20/22] drm/amd/pm: fix get dpm level count for smu13

2024-05-09 Thread Jesse Zhang
For invalid clk types, return -EINVAL to check the return.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_5_ppt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_5_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_5_ppt.c
index 59854465d711..9c2c43bfed0b 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_5_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_5_ppt.c
@@ -643,7 +643,7 @@ static int smu_v13_0_5_get_dpm_level_count(struct 
smu_context *smu,
*count = clk_table->NumDfPstatesEnabled;
break;
default:
-   break;
+   return -EINVAL;
}
 
return 0;
-- 
2.25.1



[PATCH 19/22] drm/amdgpu: Fix the warning division or modulo by zero for the variable num_xcc_per_xcp

2024-05-09 Thread Jesse Zhang
Dividing expression num_xcc_per_xcp which may be zero has undefined behavior.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c 
b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
index 414ea3f560a7..5752c6760992 100644
--- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
+++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
@@ -522,6 +522,9 @@ static int aqua_vanjaram_switch_partition_mode(struct 
amdgpu_xcp_mgr *xcp_mgr,
goto unlock;
 
num_xcc_per_xcp = __aqua_vanjaram_get_xcc_per_xcp(xcp_mgr, mode);
+   if (!num_xcc_per_xcp)
+   goto unlock;
+
if (adev->gfx.funcs->switch_partition_mode)
adev->gfx.funcs->switch_partition_mode(xcp_mgr->adev,
   num_xcc_per_xcp);
-- 
2.25.1



[PATCH 18/22] drm/amd/pm: check negtive return for table entries

2024-05-09 Thread Jesse Zhang
Function hwmgr->hwmgr_func->get_num_of_pp_table_entries(hwmgr) returns a 
negative number

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c
index f4bd8e9357e2..4433ec4e9cf2 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/pp_psm.c
@@ -30,9 +30,8 @@ int psm_init_power_state_table(struct pp_hwmgr *hwmgr)
 {
int result;
unsigned int i;
-   unsigned int table_entries;
struct pp_power_state *state;
-   int size;
+   int size, table_entries;
 
if (hwmgr->hwmgr_func->get_num_of_pp_table_entries == NULL)
return 0;
@@ -45,7 +44,7 @@ int psm_init_power_state_table(struct pp_hwmgr *hwmgr)
hwmgr->ps_size = size = hwmgr->hwmgr_func->get_power_state_size(hwmgr) +
  sizeof(struct pp_power_state);
 
-   if (table_entries == 0 || size == 0) {
+   if (table_entries <= 0 || size == 0) {
pr_warn("Please check whether power state management is 
supported on this asic\n");
return 0;
}
-- 
2.25.1



[PATCH 17/22] drm/amdgpu: fix the warning bad bit shift operation for aca_error_type type

2024-05-09 Thread Jesse Zhang
Filter invalid aca error types before performing a shift operation.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index 28febf33fb1b..9e3560c190e3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
@@ -534,7 +534,7 @@ int amdgpu_aca_get_error_data(struct amdgpu_device *adev, 
struct aca_handle *han
if (aca_handle_is_valid(handle))
return -EOPNOTSUPP;
 
-   if (!(BIT(type) & handle->mask))
+   if ((type < 0) || (!(BIT(type) & handle->mask)))
return  0;
 
return __aca_get_error_data(adev, handle, type, err_data, qctx);
-- 
2.25.1



[PATCH 16/22] drm/amd/pm: fix enum type compared against 0

2024-05-09 Thread Jesse Zhang
This less-than-zero comparison of an unsigned value is never true. type < 0U

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index d439b95bfb79..602aa6941231 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -56,7 +56,7 @@ static const char * const __smu_message_names[] = {
 static const char *smu_get_message_name(struct smu_context *smu,
enum smu_message_type type)
 {
-   if (type < 0 || type >= SMU_MSG_MAX_COUNT)
+   if (type >= SMU_MSG_MAX_COUNT)
return "unknown smu message";
 
return __smu_message_names[type];
-- 
2.25.1



[PATCH 12/22] drm/amd/pm: remove logically dead code

2024-05-09 Thread Jesse Zhang
Execution cannot reach this statement: case POWER_STATE_TYPE_BALAN.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/legacy-dpm/legacy_dpm.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/legacy_dpm.c 
b/drivers/gpu/drm/amd/pm/legacy-dpm/legacy_dpm.c
index 60377747bab4..e861355ebd75 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/legacy_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/legacy_dpm.c
@@ -831,15 +831,6 @@ static struct amdgpu_ps 
*amdgpu_dpm_pick_power_state(struct amdgpu_device *adev,
return ps;
}
break;
-   case POWER_STATE_TYPE_BALANCED:
-   if (ui_class == ATOM_PPLIB_CLASSIFICATION_UI_BALANCED) {
-   if (ps->caps & ATOM_PPLIB_SINGLE_DISPLAY_ONLY) {
-   if (single_display)
-   return ps;
-   } else
-   return ps;
-   }
-   break;
case POWER_STATE_TYPE_PERFORMANCE:
if (ui_class == 
ATOM_PPLIB_CLASSIFICATION_UI_PERFORMANCE) {
if (ps->caps & ATOM_PPLIB_SINGLE_DISPLAY_ONLY) {
-- 
2.25.1



[PATCH 15/22] drm/amd/pm: fix enum feature compared against 0

2024-05-09 Thread Jesse Zhang
This less-than-zero comparison of an unsigned value is never true. feature < 0U

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 6d1c3af927ca..d439b95bfb79 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -760,7 +760,7 @@ static const char *__smu_feature_names[] = {
 static const char *smu_get_feature_name(struct smu_context *smu,
enum smu_feature_mask feature)
 {
-   if (feature < 0 || feature >= SMU_FEATURE_COUNT)
+   if (feature >= SMU_FEATURE_COUNT)
return "unknown smu feature";
return __smu_feature_names[feature];
 }
-- 
2.25.1



[PATCH 14/22] drm/amdgu: remove unused code

2024-05-09 Thread Jesse Zhang
The same code is executed when the condition err is true or false,
because the code in the if-then branch and after the if statement is identical

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 541dbd70d8c7..16d3deac375d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -963,8 +963,6 @@ static int gfx_v7_0_init_microcode(struct amdgpu_device 
*adev)
 
snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", chip_name);
err = amdgpu_ucode_request(adev, >gfx.rlc_fw, fw_name);
-   if (err)
-   goto out;
 out:
if (err) {
pr_err("gfx7: Failed to load firmware \"%s\"\n", fw_name);
-- 
2.25.1



[PATCH 11/22] drm/amdgpu: remove structurally dead code for amd_gmc

2024-05-09 Thread Jesse Zhang
This code cannot be reached: return sysfs_emit(buf, "UNK)

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index a5f970fec242..f8ed886ffca3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -1148,8 +1148,6 @@ static ssize_t current_memory_partition_show(
default:
return sysfs_emit(buf, "UNKNOWN\n");
}
-
-   return sysfs_emit(buf, "UNKNOWN\n");
 }
 
 static DEVICE_ATTR_RO(current_memory_partition);
-- 
2.25.1



[PATCH 13/22] drm/amd/pm: remove logically dead code for renoir

2024-05-09 Thread Jesse Zhang
The switch governing value clk_type cannot be SMU_GFXCLK and SMU_SCLK.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
index aeeba0d95c9c..cc0504b063fa 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
@@ -585,8 +585,6 @@ static int renoir_print_clk_levels(struct smu_context *smu,
}
 
switch (clk_type) {
-   case SMU_GFXCLK:
-   case SMU_SCLK:
case SMU_SOCCLK:
case SMU_MCLK:
case SMU_DCEFCLK:
-- 
2.25.1



[PATCH 10/22] drm/amdgpu: remove structurally dead code

2024-05-09 Thread Jesse Zhang
This code cannot be reached: return "UNKNOWN";.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 9a946f0e015c..109f471ff315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -554,8 +554,6 @@ static inline const char *amdgpu_gfx_compute_mode_desc(int 
mode)
default:
return "UNKNOWN";
}
-
-   return "UNKNOWN";
 }
 
 #endif
-- 
2.25.1



[PATCH 08/22] drm/amd/pm: check the return of send smc msg for smu_v13

2024-05-09 Thread Jesse Zhang
Set smu work laod mask may fail, so check return.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
index e996a0a4d33e..dcb68ab51fa0 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
@@ -2495,8 +2495,10 @@ static int smu_v13_0_7_set_power_profile_mode(struct 
smu_context *smu, long *inp
   smu->power_profile_mode);
if (workload_type < 0)
return -EINVAL;
-   smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask,
+   ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask,
1 << workload_type, NULL);
+   if (ret)
+   dev_err(smu->adev->dev, "[%s] Failed to set work load mask!", 
__func__);
 
return ret;
 }
-- 
2.25.1



[PATCH 09/22] drm/amd/pm: check specific index for smu13

2024-05-09 Thread Jesse Zhang
Check for specific indexes that may be invalid values.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index 051092f1b1b4..7c343dd12a7f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -2336,6 +2336,8 @@ static int smu_v13_0_6_mode2_reset(struct smu_context 
*smu)
 
index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
   SMU_MSG_GfxDeviceDriverReset);
+   if (index < 0)
+   ret = -EINVAL;
 
mutex_lock(>message_lock);
 
-- 
2.25.1



[PATCH 07/22] drm/amd/pm: check the return of send smc msg for navi10

2024-05-09 Thread Jesse Zhang
Set smu work laod mask may fail, so check return.
Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index c06e0d6e3017..f30f1facc0f6 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -2081,9 +2081,11 @@ static int navi10_set_power_profile_mode(struct 
smu_context *smu, long *input, u
   smu->power_profile_mode);
if (workload_type < 0)
return -EINVAL;
-   smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask,
+   ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask,
1 << workload_type, NULL);
-
+   if (ret)
+   dev_err(smu->adev->dev, "[%s] Failed to set work load mask!", 
__func__);
+   
return ret;
 }
 
-- 
2.25.1



[PATCH 06/22] drm/amd/pm: check the return of send smc msg for sienna_cichild

2024-05-09 Thread Jesse Zhang
Set smu work laod mask may fail, so check return.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index e426f457a017..e7ef8cb3a791 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -1782,8 +1782,10 @@ static int sienna_cichlid_set_power_profile_mode(struct 
smu_context *smu, long *
   smu->power_profile_mode);
if (workload_type < 0)
return -EINVAL;
-   smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask,
+   ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask,
1 << workload_type, NULL);
+   if (ret)
+   dev_err(smu->adev->dev, "[%s] Failed to set work load mask!", 
__func__);
 
return ret;
 }
-- 
2.25.1



[PATCH 05/22] drm/amd/pm: check specific index for aldebaran

2024-05-09 Thread Jesse Zhang
Check for specific indexes that may be invalid values.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index ce941fbb9cfb..a22eb6bbb05e 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1886,7 +1886,8 @@ static int aldebaran_mode2_reset(struct smu_context *smu)
 
index = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_MSG,
SMU_MSG_GfxDeviceDriverReset);
-
+   if (index < 0 )
+   return -EINVAL;
mutex_lock(>message_lock);
if (smu->smc_fw_version >= 0x00441400) {
ret = smu_cmn_send_msg_without_waiting(smu, (uint16_t)index, 
SMU_RESET_MODE_2);
-- 
2.25.1



[PATCH 03/22] drm/amdgpu: fix the waring dereferencing hive

2024-05-09 Thread Jesse Zhang
Check the amdgpu_hive_info *hive that maybe is NULL.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 37820dd03cab..5a648a657dc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1362,6 +1362,9 @@ static void psp_xgmi_reflect_topology_info(struct 
psp_context *psp,
uint8_t dst_num_links = node_info.num_links;
 
hive = amdgpu_get_xgmi_hive(psp->adev);
+   if (WARN_ON(!hive))
+   return;
+
list_for_each_entry(mirror_adev, >device_list, gmc.xgmi.head) {
struct psp_xgmi_topology_info *mirror_top_info;
int j;
-- 
2.25.1



[PATCH 04/22] drm/amd: fix the warning unchecking return vaule for sdma_v7

2024-05-09 Thread Jesse Zhang
check ring allocate success before emit preempt ib

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
index 0b5af1c50461..7db53a96cff0 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c
@@ -1347,7 +1347,11 @@ static int sdma_v7_0_ring_preempt_ib(struct amdgpu_ring 
*ring)
 
/* emit the trailing fence */
ring->trail_seq += 1;
-   amdgpu_ring_alloc(ring, 10);
+   r = amdgpu_ring_alloc(ring, 10);
+   if (r) {
+   DRM_ERROR("ring %d failed to be allocated \n", ring->idx);
+   return r;
+   }
sdma_v7_0_ring_emit_fence(ring, ring->trail_fence_gpu_addr,
  ring->trail_seq, 0);
amdgpu_ring_commit(ring);
-- 
2.25.1



[PATCH 02/22] drm/amdgpu: the warning dereferencing obj for nbio_v7_4

2024-05-09 Thread Jesse Zhang
if ras_manager obj null, don't print NBIO err data

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
index fe18df10daaa..26e5885db9b7 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c
@@ -383,7 +383,7 @@ static void 
nbio_v7_4_handle_ras_controller_intr_no_bifring(struct amdgpu_device
else
WREG32_SOC15(NBIO, 0, mmBIF_DOORBELL_INT_CNTL, 
bif_doorbell_intr_cntl);
 
-   if (!ras->disable_ras_err_cnt_harvest) {
+   if (!ras->disable_ras_err_cnt_harvest && obj) {
/*
 * clear error status after ras_controller_intr
 * according to hw team and count ue number
-- 
2.25.1



[PATCH 01/22] drm/amdgpu: fix dereference after null check

2024-05-09 Thread Jesse Zhang
check the pointer hive before use.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 77f6fd50002a..00fe3c2d5431 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5725,7 +5725,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 * to put adev in the 1st position.
 */
INIT_LIST_HEAD(_list);
-   if (!amdgpu_sriov_vf(adev) && (adev->gmc.xgmi.num_physical_nodes > 1)) {
+   if (!amdgpu_sriov_vf(adev) && (adev->gmc.xgmi.num_physical_nodes > 1) 
&& hive) {
list_for_each_entry(tmp_adev, >device_list, 
gmc.xgmi.head) {
list_add_tail(_adev->reset_list, _list);
if (adev->shutdown)
-- 
2.25.1



[PATCH 2/2] drm/amd/pm: enable UMD Pstate profile level for renoir

2024-05-06 Thread Jesse Zhang
This patch enable UMD Pstates profile
level for the renoir_set_performance_level interface.

 -profile_min_sclk
 -profile_min_fclk

Signed-off-by: Jesse Zhang 
Suggested-by: Tim Huang 
---
 .../gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c   | 58 +++
 1 file changed, 48 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
index 8908bbb3ff1f..e56b7afb5b78 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
@@ -928,11 +928,55 @@ static int renoir_set_peak_clock_by_device(struct 
smu_context *smu)
return ret;
 }
 
+static int renior_set_dpm_profile_freq(struct smu_context *smu,
+   enum amd_dpm_forced_level level,
+   enum smu_clk_type clk_type)
+{
+   int ret = 0;
+   uint32_t sclk = 0, socclk = 0, fclk = 0;
+
+   switch (clk_type) {
+   case SMU_GFXCLK:
+   case SMU_SCLK:
+   sclk = RENOIR_UMD_PSTATE_GFXCLK;
+   if (level == AMD_DPM_FORCED_LEVEL_PROFILE_PEAK)
+   renoir_get_dpm_ultimate_freq(smu, SMU_SCLK, NULL, 
);
+   else if (level == AMD_DPM_FORCED_LEVEL_PROFILE_MIN_SCLK)
+   renoir_get_dpm_ultimate_freq(smu, SMU_SCLK, , 
NULL);
+   break;
+   case SMU_SOCCLK:
+   socclk = RENOIR_UMD_PSTATE_SOCCLK;
+   if (level == AMD_DPM_FORCED_LEVEL_PROFILE_PEAK)
+   renoir_get_dpm_ultimate_freq(smu, SMU_SOCCLK, NULL, 
);
+   break;
+   case SMU_FCLK:
+   fclk = RENOIR_UMD_PSTATE_FCLK;
+   if (level == AMD_DPM_FORCED_LEVEL_PROFILE_PEAK)
+   renoir_get_dpm_ultimate_freq(smu, SMU_FCLK, NULL, 
);
+   else if (level == AMD_DPM_FORCED_LEVEL_PROFILE_MIN_MCLK)
+   renoir_get_dpm_ultimate_freq(smu, SMU_FCLK, , 
NULL);
+   break;
+   default:
+   ret = -EINVAL;
+   break;
+   }
+
+   if (sclk)
+   ret = smu_v12_0_set_soft_freq_limited_range(smu, SMU_SCLK, 
sclk, sclk);
+
+   if (socclk)
+   ret = smu_v12_0_set_soft_freq_limited_range(smu, SMU_SOCCLK, 
socclk, socclk);
+
+   if (fclk)
+   ret = smu_v12_0_set_soft_freq_limited_range(smu, SMU_FCLK, 
fclk, fclk);
+
+   return ret;
+}
+
 static int renoir_set_performance_level(struct smu_context *smu,
enum amd_dpm_forced_level level)
 {
int ret = 0;
-   uint32_t sclk_mask, mclk_mask, soc_mask;
 
switch (level) {
case AMD_DPM_FORCED_LEVEL_HIGH:
@@ -1012,15 +1056,9 @@ static int renoir_set_performance_level(struct 
smu_context *smu,
smu->gfx_actual_hard_min_freq = smu->gfx_default_hard_min_freq;
smu->gfx_actual_soft_max_freq = smu->gfx_default_soft_max_freq;
 
-   ret = renoir_get_profiling_clk_mask(smu, level,
-   _mask,
-   _mask,
-   _mask);
-   if (ret)
-   return ret;
-   renoir_force_clk_levels(smu, SMU_SCLK, 1 << sclk_mask);
-   renoir_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask);
-   renoir_force_clk_levels(smu, SMU_SOCCLK, 1 << soc_mask);
+   renior_set_dpm_profile_freq(smu, level, SMU_SCLK);
+   renior_set_dpm_profile_freq(smu, level, SMU_MCLK);
+   renior_set_dpm_profile_freq(smu, level, SMU_SOCCLK);
break;
case AMD_DPM_FORCED_LEVEL_PROFILE_PEAK:
smu->gfx_actual_hard_min_freq = smu->gfx_default_hard_min_freq;
-- 
2.25.1



[PATCH 1/2] drm/amd/pm: revert the commit 576bffd10d01

2024-05-06 Thread Jesse Zhang
Revert this commit: 576bffd10d01 and will update new patch.

Signed-off-by: Jesse Zhang 
---
 .../gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c   | 32 +++
 1 file changed, 5 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
index 36a49cfc22e4..8908bbb3ff1f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
@@ -835,20 +835,10 @@ static int renoir_force_clk_levels(struct smu_context 
*smu,
ret = renoir_get_dpm_clk_limited(smu, clk_type, soft_max_level, 
_freq);
if (ret)
return ret;
-/* =  0: min_freq
- * =  1: UMD_PSTATE_CLK
- * >= 2: max_freq
- */
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxSocclkByFreq,
-   soft_max_level == 0 ? 
min_freq :
-   soft_max_level == 1 ? 
RENOIR_UMD_PSTATE_SOCCLK : max_freq,
-   NULL);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxSocclkByFreq, max_freq, NULL);
if (ret)
return ret;
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinSocclkByFreq,
-   soft_min_level == 0 ? 
min_freq :
-   soft_min_level == 1 ? 
RENOIR_UMD_PSTATE_SOCCLK : max_freq,
-   NULL);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinSocclkByFreq, min_freq, NULL);
if (ret)
return ret;
break;
@@ -860,21 +850,10 @@ static int renoir_force_clk_levels(struct smu_context 
*smu,
ret = renoir_get_dpm_clk_limited(smu, clk_type, soft_max_level, 
_freq);
if (ret)
return ret;
-   /* mclk levels are in reverse order
-* =  0: max_freq
-* =  1: UMD_PSTATE_CLK
-* >= 2: min_freq
-*/
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxFclkByFreq,
-   soft_max_level >= 2 ? 
min_freq :
-   soft_max_level == 1 ? 
RENOIR_UMD_PSTATE_FCLK : max_freq,
-   NULL);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxFclkByFreq, max_freq, NULL);
if (ret)
return ret;
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinFclkByFreq,
-   soft_min_level >= 2  ? 
min_freq :
-   soft_min_level == 1 ? 
RENOIR_UMD_PSTATE_SOCCLK : max_freq,
-   NULL);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinFclkByFreq, min_freq, NULL);
if (ret)
return ret;
break;
@@ -953,8 +932,7 @@ static int renoir_set_performance_level(struct smu_context 
*smu,
enum amd_dpm_forced_level level)
 {
int ret = 0;
-   /* default mask is UMD PSTATE CLK */
-   uint32_t sclk_mask = 1, mclk_mask = 1, soc_mask = 1;
+   uint32_t sclk_mask, mclk_mask, soc_mask;
 
switch (level) {
case AMD_DPM_FORCED_LEVEL_HIGH:
-- 
2.25.1



[PATCH 1/2] drm/amd/pm: revert the commit 576bffd10d01

2024-05-06 Thread Jesse Zhang
This patch doesn't need and will update new patch.

Signed-off-by: Jesse Zhang 
---
 .../gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c   | 32 +++
 1 file changed, 5 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
index 36a49cfc22e4..8908bbb3ff1f 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
@@ -835,20 +835,10 @@ static int renoir_force_clk_levels(struct smu_context 
*smu,
ret = renoir_get_dpm_clk_limited(smu, clk_type, soft_max_level, 
_freq);
if (ret)
return ret;
-/* =  0: min_freq
- * =  1: UMD_PSTATE_CLK
- * >= 2: max_freq
- */
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxSocclkByFreq,
-   soft_max_level == 0 ? 
min_freq :
-   soft_max_level == 1 ? 
RENOIR_UMD_PSTATE_SOCCLK : max_freq,
-   NULL);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxSocclkByFreq, max_freq, NULL);
if (ret)
return ret;
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinSocclkByFreq,
-   soft_min_level == 0 ? 
min_freq :
-   soft_min_level == 1 ? 
RENOIR_UMD_PSTATE_SOCCLK : max_freq,
-   NULL);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinSocclkByFreq, min_freq, NULL);
if (ret)
return ret;
break;
@@ -860,21 +850,10 @@ static int renoir_force_clk_levels(struct smu_context 
*smu,
ret = renoir_get_dpm_clk_limited(smu, clk_type, soft_max_level, 
_freq);
if (ret)
return ret;
-   /* mclk levels are in reverse order
-* =  0: max_freq
-* =  1: UMD_PSTATE_CLK
-* >= 2: min_freq
-*/
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxFclkByFreq,
-   soft_max_level >= 2 ? 
min_freq :
-   soft_max_level == 1 ? 
RENOIR_UMD_PSTATE_FCLK : max_freq,
-   NULL);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxFclkByFreq, max_freq, NULL);
if (ret)
return ret;
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinFclkByFreq,
-   soft_min_level >= 2  ? 
min_freq :
-   soft_min_level == 1 ? 
RENOIR_UMD_PSTATE_SOCCLK : max_freq,
-   NULL);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinFclkByFreq, min_freq, NULL);
if (ret)
return ret;
break;
@@ -953,8 +932,7 @@ static int renoir_set_performance_level(struct smu_context 
*smu,
enum amd_dpm_forced_level level)
 {
int ret = 0;
-   /* default mask is UMD PSTATE CLK */
-   uint32_t sclk_mask = 1, mclk_mask = 1, soc_mask = 1;
+   uint32_t sclk_mask, mclk_mask, soc_mask;
 
switch (level) {
case AMD_DPM_FORCED_LEVEL_HIGH:
-- 
2.25.1



[PATCH] drm/amd/pm: fix the uninitialized scalar variable warning

2024-04-30 Thread Jesse Zhang
Fix warning for using uninitialized values
sclk_mask, mclk_mask and soc_mask.
v2:Set default variable to UMD PSTATE(Tim Huang)

Signed-off-by: Jesse Zhang 
---
 .../gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c   | 32 ---
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
index 8908bbb3ff1f..36a49cfc22e4 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
@@ -835,10 +835,20 @@ static int renoir_force_clk_levels(struct smu_context 
*smu,
ret = renoir_get_dpm_clk_limited(smu, clk_type, soft_max_level, 
_freq);
if (ret)
return ret;
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxSocclkByFreq, max_freq, NULL);
+/* =  0: min_freq
+ * =  1: UMD_PSTATE_CLK
+ * >= 2: max_freq
+ */
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxSocclkByFreq,
+   soft_max_level == 0 ? 
min_freq :
+   soft_max_level == 1 ? 
RENOIR_UMD_PSTATE_SOCCLK : max_freq,
+   NULL);
if (ret)
return ret;
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinSocclkByFreq, min_freq, NULL);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinSocclkByFreq,
+   soft_min_level == 0 ? 
min_freq :
+   soft_min_level == 1 ? 
RENOIR_UMD_PSTATE_SOCCLK : max_freq,
+   NULL);
if (ret)
return ret;
break;
@@ -850,10 +860,21 @@ static int renoir_force_clk_levels(struct smu_context 
*smu,
ret = renoir_get_dpm_clk_limited(smu, clk_type, soft_max_level, 
_freq);
if (ret)
return ret;
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxFclkByFreq, max_freq, NULL);
+   /* mclk levels are in reverse order
+* =  0: max_freq
+* =  1: UMD_PSTATE_CLK
+* >= 2: min_freq
+*/
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetSoftMaxFclkByFreq,
+   soft_max_level >= 2 ? 
min_freq :
+   soft_max_level == 1 ? 
RENOIR_UMD_PSTATE_FCLK : max_freq,
+   NULL);
if (ret)
return ret;
-   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinFclkByFreq, min_freq, NULL);
+   ret = smu_cmn_send_smc_msg_with_param(smu, 
SMU_MSG_SetHardMinFclkByFreq,
+   soft_min_level >= 2  ? 
min_freq :
+   soft_min_level == 1 ? 
RENOIR_UMD_PSTATE_SOCCLK : max_freq,
+   NULL);
if (ret)
return ret;
break;
@@ -932,7 +953,8 @@ static int renoir_set_performance_level(struct smu_context 
*smu,
enum amd_dpm_forced_level level)
 {
int ret = 0;
-   uint32_t sclk_mask, mclk_mask, soc_mask;
+   /* default mask is UMD PSTATE CLK */
+   uint32_t sclk_mask = 1, mclk_mask = 1, soc_mask = 1;
 
switch (level) {
case AMD_DPM_FORCED_LEVEL_HIGH:
-- 
2.25.1



[PATCH V2] drm/amd/pm: fix warning using uninitialized value of max_vid_step

2024-04-29 Thread Jesse Zhang
Check the return of pp_atomfwctrl_get_Voltage_table_v4
as it may fail to initialize max_vid_step
V2: change the check condition (Tim Huang)

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
index b602059436a8..d004cdbe97b4 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
@@ -2573,8 +2573,11 @@ static int vega10_init_smc_table(struct pp_hwmgr *hwmgr)
}
}
 
-   pp_atomfwctrl_get_voltage_table_v4(hwmgr, VOLTAGE_TYPE_VDDC,
+   result = pp_atomfwctrl_get_voltage_table_v4(hwmgr, VOLTAGE_TYPE_VDDC,
VOLTAGE_OBJ_SVID2,  _table);
+   PP_ASSERT_WITH_CODE(!result,
+   "Failed to get voltage table!",
+   return result);
pp_table->MaxVidStep = voltage_table.max_vid_step;
 
pp_table->GfxDpmVoltageMode =
-- 
2.25.1



[PATCH] drm/amd/pm: fix warning using uninitialized value of max_vid_step

2024-04-28 Thread Jesse Zhang
Check the return of pp_atomfwctrl_get_Voltage_table_v4
as it may fail to initialize max_vid_step

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
index b602059436a8..70c711cec897 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
@@ -2573,8 +2573,12 @@ static int vega10_init_smc_table(struct pp_hwmgr *hwmgr)
}
}
 
-   pp_atomfwctrl_get_voltage_table_v4(hwmgr, VOLTAGE_TYPE_VDDC,
+   result = pp_atomfwctrl_get_voltage_table_v4(hwmgr, VOLTAGE_TYPE_VDDC,
VOLTAGE_OBJ_SVID2,  _table);
+   PP_ASSERT_WITH_CODE(result < 0,
+   "Failed to get voltage tables!",
+   return result);
+
pp_table->MaxVidStep = voltage_table.max_vid_step;
 
pp_table->GfxDpmVoltageMode =
-- 
2.25.1



[PATCH 3/3 V2] drm/amd/pm: fix the uninitialized scalar variable warning

2024-04-28 Thread Jesse Zhang
Fix warning for using uninitialized values sclk_mask, mck_mask and soc_mask.
 v2: Init the variables in the renoir_get_profiling_clk_mask(Tim Huang)

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
index 8908bbb3ff1f..546a2268823a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
@@ -253,6 +253,10 @@ static int renoir_get_profiling_clk_mask(struct 
smu_context *smu,
 uint32_t *mclk_mask,
 uint32_t *soc_mask)
 {
+   *sclk_mask = 0;
+   /* mclk levels are in reverse order */
+   *mclk_maks = NUM_MEMCLK_DPM_LEVELS - 1;
+   *sock_mask = 0;
 
if (level == AMD_DPM_FORCED_LEVEL_PROFILE_MIN_SCLK) {
if (sclk_mask)
-- 
2.25.1



[PATCH 1/3 V2] drm/amd/pm: Fix negative array index read warning for pptable->DpmDescriptor

2024-04-28 Thread Jesse Zhang
Avoid using the negative values
for clk_idex as an index into an array pptable->DpmDescriptor.

V2: fix clk_index return check (Tim Huang)

Signed-off-by: Jesse Zhang 
---
 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 27 ++-
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 5a68d365967f..c06e0d6e3017 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -1219,19 +1219,22 @@ static int navi10_get_current_clk_freq_by_table(struct 
smu_context *smu,
   value);
 }
 
-static bool navi10_is_support_fine_grained_dpm(struct smu_context *smu, enum 
smu_clk_type clk_type)
+static int navi10_is_support_fine_grained_dpm(struct smu_context *smu, enum 
smu_clk_type clk_type)
 {
PPTable_t *pptable = smu->smu_table.driver_pptable;
DpmDescriptor_t *dpm_desc = NULL;
-   uint32_t clk_index = 0;
+   int clk_index = 0;
 
clk_index = smu_cmn_to_asic_specific_index(smu,
   CMN2ASIC_MAPPING_CLK,
   clk_type);
+   if (clk_index < 0)
+   return clk_index;
+
dpm_desc = >DpmDescriptor[clk_index];
 
/* 0 - Fine grained DPM, 1 - Discrete DPM */
-   return dpm_desc->SnapToDiscrete == 0;
+   return dpm_desc->SnapToDiscrete == 0 ? 1 : 0;
 }
 
 static inline bool navi10_od_feature_is_supported(struct 
smu_11_0_overdrive_table *od_table, enum SMU_11_0_ODFEATURE_CAP cap)
@@ -1287,7 +1290,11 @@ static int navi10_emit_clk_levels(struct smu_context 
*smu,
if (ret)
return ret;
 
-   if (!navi10_is_support_fine_grained_dpm(smu, clk_type)) {
+   ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
+   if (ret < 0)
+   return ret;
+
+   if (!ret) {
for (i = 0; i < count; i++) {
ret = smu_v11_0_get_dpm_freq_by_index(smu,
  clk_type, 
i, );
@@ -1496,7 +1503,11 @@ static int navi10_print_clk_levels(struct smu_context 
*smu,
if (ret)
return size;
 
-   if (!navi10_is_support_fine_grained_dpm(smu, clk_type)) {
+   ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
+   if (ret < 0)
+   return ret;
+
+   if (!ret) {
for (i = 0; i < count; i++) {
ret = smu_v11_0_get_dpm_freq_by_index(smu, 
clk_type, i, );
if (ret)
@@ -1665,7 +1676,11 @@ static int navi10_force_clk_levels(struct smu_context 
*smu,
case SMU_UCLK:
case SMU_FCLK:
/* There is only 2 levels for fine grained DPM */
-   if (navi10_is_support_fine_grained_dpm(smu, clk_type)) {
+   ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
+   if (ret < 0)
+   return ret;
+
+   if (ret) {
soft_max_level = (soft_max_level >= 1 ? 1 : 0);
soft_min_level = (soft_min_level >= 1 ? 1 : 0);
}
-- 
2.25.1



[PATCH 2/2 V2] drm/amd/pm: fix uninitialized variable warning

2024-04-28 Thread Jesse Zhang
Check the return of function smum_send_msg_to_smc
as it may fail to initialize the variable.

Signed-off-by: Jesse Zhang 
---
 .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c  |  8 +--
 .../drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c   | 21 ---
 .../drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 19 +++--
 .../amd/pm/powerplay/smumgr/smu10_smumgr.c|  5 -
 4 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
index 02ba68d7c654..f9f016cb60ce 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
@@ -1310,13 +1310,17 @@ static int smu10_read_sensor(struct pp_hwmgr *hwmgr, 
int idx,
 
switch (idx) {
case AMDGPU_PP_SENSOR_GFX_SCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetGfxclkFrequency, 
);
+   ret = smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetGfxclkFrequency, 
);
+   if (ret)
+   break;
/* in units of 10KHZ */
*((uint32_t *)value) = sclk * 100;
*size = 4;
break;
case AMDGPU_PP_SENSOR_GFX_MCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetFclkFrequency, );
+   ret = smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetFclkFrequency, 
);
+   if (ret)
+   break;
/* in units of 10KHZ */
*((uint32_t *)value) = mclk * 100;
*size = 4;
diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
index 1fcd4451001f..5c95eda6cbd2 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
@@ -4000,6 +4000,7 @@ static int smu7_read_sensor(struct pp_hwmgr *hwmgr, int 
idx,
uint32_t offset, val_vid;
struct smu7_hwmgr *data = (struct smu7_hwmgr *)(hwmgr->backend);
struct amdgpu_device *adev = hwmgr->adev;
+   int ret = 0;
 
/* size must be at least 4 bytes for all sensors */
if (*size < 4)
@@ -4007,12 +4008,16 @@ static int smu7_read_sensor(struct pp_hwmgr *hwmgr, int 
idx,
 
switch (idx) {
case AMDGPU_PP_SENSOR_GFX_SCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_API_GetSclkFrequency, 
);
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_API_GetSclkFrequency, );
+   if (ret)
+   return ret;
*((uint32_t *)value) = sclk;
*size = 4;
return 0;
case AMDGPU_PP_SENSOR_GFX_MCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_API_GetMclkFrequency, 
);
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_API_GetMclkFrequency, );
+   if (ret)
+   return ret;
*((uint32_t *)value) = mclk;
*size = 4;
return 0;
@@ -4965,13 +4970,14 @@ static int smu7_print_clock_levels(struct pp_hwmgr 
*hwmgr,
struct smu7_odn_dpm_table *odn_table = &(data->odn_dpm_table);
struct phm_odn_clock_levels *odn_sclk_table = 
&(odn_table->odn_core_clock_dpm_levels);
struct phm_odn_clock_levels *odn_mclk_table = 
&(odn_table->odn_memory_clock_dpm_levels);
-   int size = 0;
+   int size = 0, ret = 0;
uint32_t i, now, clock, pcie_speed;
 
switch (type) {
case PP_SCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_API_GetSclkFrequency, 
);
-
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_API_GetSclkFrequency, );
+   if (ret)
+   return ret;
for (i = 0; i < sclk_table->count; i++) {
if (clock > sclk_table->dpm_levels[i].value)
continue;
@@ -4985,8 +4991,9 @@ static int smu7_print_clock_levels(struct pp_hwmgr *hwmgr,
(i == now) ? "*" : "");
break;
case PP_MCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_API_GetMclkFrequency, 
);
-
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_API_GetMclkFrequency, );
+   if (ret)
+   return ret;
for (i = 0; i < mclk_table->count; i++) {
if (clock > mclk_table->dpm_levels[i].value)
continue;
diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
index 9f5bd998c6bf..b47e1ab12430 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
@@ -2481,10 +2481,12 @@ static int 
vega10_populate_and_upload_avfs_fuse_

[PATCH 2/2] drm/amd/pm: fix uninitialized variable warning

2024-04-26 Thread Jesse Zhang
Check the return of function smum_send_msg_to_smc
as it may fail to initialize the variable.

Signed-off-by: Jesse Zhang 
---
 .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c  |  8 +--
 .../drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c   | 21 ---
 .../drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 19 +++--
 .../amd/pm/powerplay/smumgr/smu10_smumgr.c|  5 -
 4 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
index 02ba68d7c654..f9f016cb60ce 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c
@@ -1310,13 +1310,17 @@ static int smu10_read_sensor(struct pp_hwmgr *hwmgr, 
int idx,
 
switch (idx) {
case AMDGPU_PP_SENSOR_GFX_SCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetGfxclkFrequency, 
);
+   ret = smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetGfxclkFrequency, 
);
+   if (ret)
+   break;
/* in units of 10KHZ */
*((uint32_t *)value) = sclk * 100;
*size = 4;
break;
case AMDGPU_PP_SENSOR_GFX_MCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetFclkFrequency, );
+   ret = smum_send_msg_to_smc(hwmgr, PPSMC_MSG_GetFclkFrequency, 
);
+   if (ret)
+   break;
/* in units of 10KHZ */
*((uint32_t *)value) = mclk * 100;
*size = 4;
diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
index 1fcd4451001f..5c95eda6cbd2 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
@@ -4000,6 +4000,7 @@ static int smu7_read_sensor(struct pp_hwmgr *hwmgr, int 
idx,
uint32_t offset, val_vid;
struct smu7_hwmgr *data = (struct smu7_hwmgr *)(hwmgr->backend);
struct amdgpu_device *adev = hwmgr->adev;
+   int ret = 0;
 
/* size must be at least 4 bytes for all sensors */
if (*size < 4)
@@ -4007,12 +4008,16 @@ static int smu7_read_sensor(struct pp_hwmgr *hwmgr, int 
idx,
 
switch (idx) {
case AMDGPU_PP_SENSOR_GFX_SCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_API_GetSclkFrequency, 
);
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_API_GetSclkFrequency, );
+   if (ret)
+   return ret;
*((uint32_t *)value) = sclk;
*size = 4;
return 0;
case AMDGPU_PP_SENSOR_GFX_MCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_API_GetMclkFrequency, 
);
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_API_GetMclkFrequency, );
+   if (ret)
+   return ret;
*((uint32_t *)value) = mclk;
*size = 4;
return 0;
@@ -4965,13 +4970,14 @@ static int smu7_print_clock_levels(struct pp_hwmgr 
*hwmgr,
struct smu7_odn_dpm_table *odn_table = &(data->odn_dpm_table);
struct phm_odn_clock_levels *odn_sclk_table = 
&(odn_table->odn_core_clock_dpm_levels);
struct phm_odn_clock_levels *odn_mclk_table = 
&(odn_table->odn_memory_clock_dpm_levels);
-   int size = 0;
+   int size = 0, ret = 0;
uint32_t i, now, clock, pcie_speed;
 
switch (type) {
case PP_SCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_API_GetSclkFrequency, 
);
-
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_API_GetSclkFrequency, );
+   if (ret)
+   return ret;
for (i = 0; i < sclk_table->count; i++) {
if (clock > sclk_table->dpm_levels[i].value)
continue;
@@ -4985,8 +4991,9 @@ static int smu7_print_clock_levels(struct pp_hwmgr *hwmgr,
(i == now) ? "*" : "");
break;
case PP_MCLK:
-   smum_send_msg_to_smc(hwmgr, PPSMC_MSG_API_GetMclkFrequency, 
);
-
+   ret = smum_send_msg_to_smc(hwmgr, 
PPSMC_MSG_API_GetMclkFrequency, );
+   if (ret)
+   return ret;
for (i = 0; i < mclk_table->count; i++) {
if (clock > mclk_table->dpm_levels[i].value)
continue;
diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
index 9f5bd998c6bf..b47e1ab12430 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c
@@ -2481,10 +2481,12 @@ static int 
vega10_populate_and_upload_avfs_fuse_

[PATCH 1/2] drm/amd/pm: fix the uninitialized scalar variable waring

2024-04-26 Thread Jesse Zhang
Initialize variable size before calling
hwmgr->hwmgr_func->iread_sensor, such as smu7_read_sensor.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c 
b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
index 5fb21a0508cd..ec2b6d0674ed 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/amd_powerplay.c
@@ -102,6 +102,7 @@ static void pp_swctf_delayed_work_handler(struct 
work_struct *work)
uint32_t gpu_temperature, size;
int ret;
 
+   size = sizeof(gpu_temperature);
/*
 * If the hotspot/edge temperature is confirmed as below SW CTF setting 
point
 * after the delay enforced, nothing will be done.
-- 
2.25.1



[PATCH 3/3] drm/amd/pm: fix the uninitialized scalar variable warning

2024-04-26 Thread Jesse Zhang
Fix warning for using uninitialized values ​​sclk_mask, mck_mask and soc_mask.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
index 8908bbb3ff1f..10f673b651a0 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu12/renoir_ppt.c
@@ -932,7 +932,7 @@ static int renoir_set_performance_level(struct smu_context 
*smu,
enum amd_dpm_forced_level level)
 {
int ret = 0;
-   uint32_t sclk_mask, mclk_mask, soc_mask;
+   uint32_t sclk_mask, mclk_mask, soc_mask = 0;
 
switch (level) {
case AMD_DPM_FORCED_LEVEL_HIGH:
@@ -1018,8 +1018,10 @@ static int renoir_set_performance_level(struct 
smu_context *smu,
_mask);
if (ret)
return ret;
-   renoir_force_clk_levels(smu, SMU_SCLK, 1 << sclk_mask);
-   renoir_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask);
+   if (level == AMD_DPM_FORCED_LEVEL_PROFILE_MIN_SCLK)
+   renoir_force_clk_levels(smu, SMU_SCLK, 1 << sclk_mask);
+   else
+   renoir_force_clk_levels(smu, SMU_MCLK, 1 << mclk_mask);
renoir_force_clk_levels(smu, SMU_SOCCLK, 1 << soc_mask);
break;
case AMD_DPM_FORCED_LEVEL_PROFILE_PEAK:
-- 
2.25.1



[PATCH 2/3] drm/amd/pm: fix the Out-of-bounds read warning

2024-04-26 Thread Jesse Zhang
using index i - 1U may beyond element index
for mc_data[] when i = 0.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c
index b1b4c09c3467..b56298d9da98 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/ppatomctrl.c
@@ -73,8 +73,9 @@ static int atomctrl_retrieve_ac_timing(
j++;
} else if 
((table->mc_reg_address[i].uc_pre_reg_data &
LOW_NIBBLE_MASK) == 
DATA_EQU_PREV) {
-   
table->mc_reg_table_entry[num_ranges].mc_data[i] =
-   
table->mc_reg_table_entry[num_ranges].mc_data[i-1];
+   if (i)
+   
table->mc_reg_table_entry[num_ranges].mc_data[i] =
+   
table->mc_reg_table_entry[num_ranges].mc_data[i-1];
}
}
num_ranges++;
-- 
2.25.1



[PATCH 1/3] drm/amd/pm: Fix negative array index read warning for pptable->DpmDescriptor

2024-04-26 Thread Jesse Zhang
Avoid using the negative values 
for clk_idex as an index into an array pptable->DpmDescriptor.

Signed-off-by: Jesse Zhang 
---
 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 25 +++
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 5a68d365967f..cd88d2c3841a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -1219,15 +1219,18 @@ static int navi10_get_current_clk_freq_by_table(struct 
smu_context *smu,
   value);
 }
 
-static bool navi10_is_support_fine_grained_dpm(struct smu_context *smu, enum 
smu_clk_type clk_type)
+static int navi10_is_support_fine_grained_dpm(struct smu_context *smu, enum 
smu_clk_type clk_type)
 {
PPTable_t *pptable = smu->smu_table.driver_pptable;
DpmDescriptor_t *dpm_desc = NULL;
-   uint32_t clk_index = 0;
+   int clk_index = 0;
 
clk_index = smu_cmn_to_asic_specific_index(smu,
   CMN2ASIC_MAPPING_CLK,
   clk_type);
+   if(clk_index)
+   return clk_index;
+
dpm_desc = >DpmDescriptor[clk_index];
 
/* 0 - Fine grained DPM, 1 - Discrete DPM */
@@ -1287,7 +1290,11 @@ static int navi10_emit_clk_levels(struct smu_context 
*smu,
if (ret)
return ret;
 
-   if (!navi10_is_support_fine_grained_dpm(smu, clk_type)) {
+   ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
+   if (ret < 0)
+   return ret;
+
+   if (!ret) {
for (i = 0; i < count; i++) {
ret = smu_v11_0_get_dpm_freq_by_index(smu,
  clk_type, 
i, );
@@ -1496,7 +1503,11 @@ static int navi10_print_clk_levels(struct smu_context 
*smu,
if (ret)
return size;
 
-   if (!navi10_is_support_fine_grained_dpm(smu, clk_type)) {
+   ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
+   if (ret < 0)
+   return ret;
+
+   if (!ret) {
for (i = 0; i < count; i++) {
ret = smu_v11_0_get_dpm_freq_by_index(smu, 
clk_type, i, );
if (ret)
@@ -1665,7 +1676,11 @@ static int navi10_force_clk_levels(struct smu_context 
*smu,
case SMU_UCLK:
case SMU_FCLK:
/* There is only 2 levels for fine grained DPM */
-   if (navi10_is_support_fine_grained_dpm(smu, clk_type)) {
+   ret = navi10_is_support_fine_grained_dpm(smu, clk_type);
+   if (ret < 0)
+   return ret;
+
+   if (ret) {
soft_max_level = (soft_max_level >= 1 ? 1 : 0);
soft_min_level = (soft_min_level >= 1 ? 1 : 0);
}
-- 
2.25.1



[PATCH] drm/amdgpu: fix the warning about the expression (int)size - len

2024-04-25 Thread Jesse Zhang
Converting size from size_t to int may overflow.
v2: keep reverse xmas tree order (Christian)

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index f5d0fa207a88..b62ae3c91a9d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2065,12 +2065,13 @@ static ssize_t 
amdgpu_reset_dump_register_list_write(struct file *f,
struct amdgpu_device *adev = (struct amdgpu_device 
*)file_inode(f)->i_private;
char reg_offset[11];
uint32_t *new = NULL, *tmp = NULL;
-   int ret, i = 0, len = 0;
+   unsigned int len = 0;
+   int ret, i = 0;
 
do {
memset(reg_offset, 0, 11);
if (copy_from_user(reg_offset, buf + len,
-   min(10, ((int)size-len {
+   min(10, (size-len {
ret = -EFAULT;
goto error_free;
}
-- 
2.25.1



[PATCH V2] drm/amdgpu: fix the warning about the expression (int)size - len

2024-04-25 Thread Jesse Zhang
Converting size from size_t to int may overflow.
v2: keep reverse xmas tree order (Christian)

Signed-off-by: Jesse Zhang 

---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index f5d0fa207a88..eed60d4b3390 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2065,12 +2065,13 @@ static ssize_t 
amdgpu_reset_dump_register_list_write(struct file *f,
struct amdgpu_device *adev = (struct amdgpu_device 
*)file_inode(f)->i_private;
char reg_offset[11];
uint32_t *new = NULL, *tmp = NULL;
+   unsigned int len = 0;
int ret, i = 0, len = 0;
 
do {
memset(reg_offset, 0, 11);
if (copy_from_user(reg_offset, buf + len,
-   min(10, ((int)size-len {
+   min(10, (size-len {
ret = -EFAULT;
goto error_free;
}
-- 
2.25.1



[PATCH] drm/amdgpu: fix the warning about the expression (int)size - len

2024-04-25 Thread Jesse Zhang
Converting size from size_t to int may overflow.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index f5d0fa207a88..b828aad4f35e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -2065,12 +2065,13 @@ static ssize_t 
amdgpu_reset_dump_register_list_write(struct file *f,
struct amdgpu_device *adev = (struct amdgpu_device 
*)file_inode(f)->i_private;
char reg_offset[11];
uint32_t *new = NULL, *tmp = NULL;
-   int ret, i = 0, len = 0;
+   int ret, i = 0;
+   unsigned int len = 0;
 
do {
memset(reg_offset, 0, 11);
if (copy_from_user(reg_offset, buf + len,
-   min(10, ((int)size-len {
+   min(10, (size-len {
ret = -EFAULT;
goto error_free;
}
-- 
2.25.1



[PATCH 4/4 V2] drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc

2024-04-24 Thread Jesse Zhang
Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x0301.
V2: To really improve the handling we would actually
   need to have a separate value of 0x.(Christian)

Signed-off-by: Jesse Zhang 
Suggested-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 59acf424a078..968ca2c84ef7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -743,7 +743,8 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p,
uint32_t created = 0;
uint32_t allocated = 0;
uint32_t tmp, handle = 0;
-   uint32_t *size = 
+   uint32_t dummy = 0x;
+   uint32_t *size = 
unsigned int idx;
int i, r = 0;
 
-- 
2.25.1



[PATCH 4/4 V2] drm/amdgpu: Using uninitialized value *size when calling amdgpu_vce_cs_reloc

2024-04-24 Thread Jesse Zhang
Initialize the size before calling amdgpu_vce_cs_reloc, such as case 0x0301.
V2: To really improve the handling we would actually
need to have a separate value of 0x.(Christian)

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 59acf424a078..1929de0db3a1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -742,7 +742,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p,
uint32_t destroyed = 0;
uint32_t created = 0;
uint32_t allocated = 0;
-   uint32_t tmp, handle = 0;
+   uint32_t tmp = 0x, handle = 0;
uint32_t *size = 
unsigned int idx;
int i, r = 0;
-- 
2.25.1



[PATCH] drm/amdgpu: fix some uninitialized variables

2024-04-23 Thread Jesse Zhang
Fix some variables not initialized before use.
Scan them out using Synopsys tools.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 +
 drivers/gpu/drm/amd/amdgpu/atom.c   | 1 +
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c  | 3 ++-
 drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c  | 3 ++-
 drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c  | 3 ++-
 6 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 59acf424a078..60d97cd14855 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -742,7 +742,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p,
uint32_t destroyed = 0;
uint32_t created = 0;
uint32_t allocated = 0;
-   uint32_t tmp, handle = 0;
+   uint32_t tmp = 0, handle = 0;
uint32_t *size = 
unsigned int idx;
int i, r = 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
index 677eb141554e..13125ddd5e86 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
@@ -410,6 +410,11 @@ static void amdgpu_vcn_idle_work_handler(struct 
work_struct *work)
else
new_state.fw_based = VCN_DPG_STATE__UNPAUSE;
 
+   if 
(amdgpu_fence_count_emitted(adev->jpeg.inst->ring_dec))
+   new_state.jpeg = VCN_DPG_STATE__PAUSE;
+   else
+   new_state.jpeg = VCN_DPG_STATE__UNPAUSE;
+
adev->vcn.pause_dpg_mode(adev, j, _state);
}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/atom.c 
b/drivers/gpu/drm/amd/amdgpu/atom.c
index 72362df352f6..d552e013354c 100644
--- a/drivers/gpu/drm/amd/amdgpu/atom.c
+++ b/drivers/gpu/drm/amd/amdgpu/atom.c
@@ -1243,6 +1243,7 @@ static int amdgpu_atom_execute_table_locked(struct 
atom_context *ctx, int index,
ectx.ps_size = params_size;
ectx.abort = false;
ectx.last_jump = 0;
+   ectx.last_jump_jiffies = 0;
if (ws) {
ectx.ws = kcalloc(4, ws, GFP_KERNEL);
ectx.ws_size = ws;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
index 45a2d0a5a2d7..b7d33d78bce0 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
@@ -999,7 +999,8 @@ static int sdma_v5_0_ring_test_ring(struct amdgpu_ring 
*ring)
r = amdgpu_ring_alloc(ring, 20);
if (r) {
DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", 
ring->idx, r);
-   amdgpu_device_wb_free(adev, index);
+   if (!ring->is_mes_queue)
+   amdgpu_device_wb_free(adev, index);
return r;
}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
index 43e64b2da575..cc9e961f0078 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
@@ -839,7 +839,8 @@ static int sdma_v5_2_ring_test_ring(struct amdgpu_ring 
*ring)
r = amdgpu_ring_alloc(ring, 20);
if (r) {
DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", 
ring->idx, r);
-   amdgpu_device_wb_free(adev, index);
+   if (!ring->is_mes_queue)
+   amdgpu_device_wb_free(adev, index);
return r;
}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
index 1f4877195213..c833b6b8373b 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c
@@ -861,7 +861,8 @@ static int sdma_v6_0_ring_test_ring(struct amdgpu_ring 
*ring)
r = amdgpu_ring_alloc(ring, 5);
if (r) {
DRM_ERROR("amdgpu: dma failed to lock ring %d (%d).\n", 
ring->idx, r);
-   amdgpu_device_wb_free(adev, index);
+   if (!ring->is_mes_queue)
+   amdgpu_device_wb_free(adev, index);
return r;
}
 
-- 
2.25.1



[PATCH] drm/ttm: remove unused paramter

2024-03-25 Thread Jesse Zhang
remove the unsed the paramter in the function
ttm_bo_bounce_temp_buffer and ttm_bo_add_move_fence.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index edf10618fe2b..7f08787687a7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -402,7 +402,6 @@ void ttm_bo_put(struct ttm_buffer_object *bo)
 EXPORT_SYMBOL(ttm_bo_put);
 
 static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo,
-struct ttm_resource **mem,
 struct ttm_operation_ctx *ctx,
 struct ttm_place *hop)
 {
@@ -470,7 +469,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
if (ret != -EMULTIHOP)
break;
 
-   ret = ttm_bo_bounce_temp_buffer(bo, _mem, ctx, );
+   ret = ttm_bo_bounce_temp_buffer(bo, ctx, );
} while (!ret);
 
if (ret) {
@@ -699,7 +698,6 @@ EXPORT_SYMBOL(ttm_bo_unpin);
  */
 static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo,
 struct ttm_resource_manager *man,
-struct ttm_resource *mem,
 bool no_wait_gpu)
 {
struct dma_fence *fence;
@@ -753,7 +751,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
return ret;
} while (1);
 
-   return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);
+   return ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
 }
 
 /**
@@ -802,7 +800,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo,
if (unlikely(ret))
goto error;
 
-   ret = ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu);
+   ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu);
if (unlikely(ret)) {
ttm_resource_free(bo, mem);
if (ret == -EBUSY)
@@ -866,7 +864,7 @@ static int ttm_bo_move_buffer(struct ttm_buffer_object *bo,
 bounce:
ret = ttm_bo_handle_move_mem(bo, mem, false, ctx, );
if (ret == -EMULTIHOP) {
-   ret = ttm_bo_bounce_temp_buffer(bo, , ctx, );
+   ret = ttm_bo_bounce_temp_buffer(bo, ctx, );
if (ret)
goto out;
/* try and move to final place now. */
-- 
2.25.1



[PATCH] drm/amdkfd: fix shift out of bounds about gpu debug

2024-02-29 Thread Jesse Zhang
 the issue is :
[  388.151802] UBSAN: shift-out-of-bounds in 
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_int_process_v10.c:346:5
[  388.151807] shift exponent 4294967295 is too large for 64-bit type 'long 
long unsigned int'
[  388.151812] CPU: 6 PID: 347 Comm: kworker/6:1H Tainted: GE  
6.7.0+ #1
[  388.151814] Hardware name: AMD Splinter/Splinter-GNR, BIOS WS54117N_140 
01/16/2024
[  388.151816] Workqueue: KFD IH interrupt_wq [amdgpu]
[  388.152084] Call Trace:
[  388.152086]  
[  388.152089]  dump_stack_lvl+0x4c/0x70
[  388.152096]  dump_stack+0x14/0x20
[  388.152098]  ubsan_epilogue+0x9/0x40
[  388.152101]  __ubsan_handle_shift_out_of_bounds+0x113/0x170
[  388.152103]  ? vprintk+0x40/0x70
[  388.152106]  ? swsusp_check+0x131/0x190
[  388.152110]  event_interrupt_wq_v10.cold+0x16/0x1e [amdgpu]
[  388.152411]  ? raw_spin_rq_unlock+0x14/0x40
[  388.152415]  ? finish_task_switch+0x85/0x2a0
[  388.152417]  ? kfifo_copy_out+0x5f/0x70
[  388.152420]  interrupt_wq+0xb2/0x120 [amdgpu]
[  388.152642]  ? interrupt_wq+0xb2/0x120 [amdgpu]
[  388.152728]  process_scheduled_works+0x9a/0x3a0
[  388.152731]  ? __pfx_worker_thread+0x10/0x10
[  388.152732]  worker_thread+0x15f/0x2d0
[  388.152733]  ? __pfx_worker_thread+0x10/0x10
[  388.152734]  kthread+0xfb/0x130
[  388.152735]  ? __pfx_kthread+0x10/0x10
[  388.152736]  ret_from_fork+0x3d/0x60
[  388.152738]  ? __pfx_kthread+0x10/0x10
[  388.152739]  ret_from_fork_asm+0x1b/0x30
[  388.152742]  

Signed-off-by: Jesse Zhang 
---
 include/uapi/linux/kfd_ioctl.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 9ce46edc62a5..3d5867df17e8 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -887,7 +887,7 @@ enum kfd_dbg_trap_exception_code {
 };
 
 /* Mask generated by ecode in kfd_dbg_trap_exception_code */
-#define KFD_EC_MASK(ecode) (1ULL << (ecode - 1))
+#define KFD_EC_MASK(ecode) (ecode ? (1ULL << (ecode - 1)) : 0ULL)
 
 /* Masks for exception code type checks below */
 #define KFD_EC_MASK_QUEUE  (KFD_EC_MASK(EC_QUEUE_WAVE_ABORT) | \
-- 
2.25.1



[PATCH V3] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven

2024-02-28 Thread Jesse . Zhang
fix the issue:
"amdgpu: Failed to create process VM object".

[Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm 
page table.
But when clifo run. It also initializes a vm for a process device through the 
function kfd_process_device_init_vm and ensure the root PD is clean through the 
function amdgpu_vm_pt_is_root_clean.
So they have a conflict, and clinfo  always failed.

v1:
  - remove all the pte_supports_ats stuff from the amdgpu_vm code (Felix)

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 23 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  3 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 56 +--
 3 files changed, 1 insertion(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ed4a8c5d26d7..d004ace79536 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1385,10 +1385,6 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
struct amdgpu_bo_va_mapping, list);
list_del(>list);
 
-   if (vm->pte_support_ats &&
-   mapping->start < AMDGPU_GMC_HOLE_START)
-   init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
-
r = amdgpu_vm_update_range(adev, vm, false, false, true, false,
   resv, mapping->start, mapping->last,
   init_pte_value, 0, 0, NULL, NULL,
@@ -2264,7 +2260,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
if (r)
return r;
 
-   vm->pte_support_ats = false;
vm->is_compute_context = false;
 
vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
@@ -2350,30 +2345,12 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
  */
 int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 {
-   bool pte_support_ats = (adev->asic_type == CHIP_RAVEN);
int r;
 
r = amdgpu_bo_reserve(vm->root.bo, true);
if (r)
return r;
 
-   /* Check if PD needs to be reinitialized and do it before
-* changing any other state, in case it fails.
-*/
-   if (pte_support_ats != vm->pte_support_ats) {
-   /* Sanity checks */
-   if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
-   r = -EINVAL;
-   goto unreserve_bo;
-   }
-
-   vm->pte_support_ats = pte_support_ats;
-   r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo),
-  false);
-   if (r)
-   goto unreserve_bo;
-   }
-
/* Update VM state */
vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
AMDGPU_VM_USE_CPU_FOR_COMPUTE);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 42f6ddec50c1..9f6b5e1ccf34 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -357,9 +357,6 @@ struct amdgpu_vm {
/* Functions to use for VM table updates */
const struct amdgpu_vm_update_funcs *update_funcs;
 
-   /* Flag to indicate ATS support from PTE for GFX9 */
-   boolpte_support_ats;
-
/* Up to 128 pending retry page faults */
DECLARE_KFIFO(faults, u64, 128);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index a160265ddc07..2835cb3f76eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -89,22 +89,6 @@ static unsigned int amdgpu_vm_pt_num_entries(struct 
amdgpu_device *adev,
return AMDGPU_VM_PTE_COUNT(adev);
 }
 
-/**
- * amdgpu_vm_pt_num_ats_entries - return the number of ATS entries in the root 
PD
- *
- * @adev: amdgpu_device pointer
- *
- * Returns:
- * The number of entries in the root page directory which needs the ATS 
setting.
- */
-static unsigned int amdgpu_vm_pt_num_ats_entries(struct amdgpu_device *adev)
-{
-   unsigned int shift;
-
-   shift = amdgpu_vm_pt_level_shift(adev, adev->vm_manager.root_level);
-   return AMDGPU_GMC_HOLE_START >> (shift + AMDGPU_GPU_PAGE_SHIFT);
-}
-
 /**
  * amdgpu_vm_pt_entries_mask - the mask to get the entry number of a PD/PT
  *
@@ -379,7 +363,7 @@ int amdgpu_vm_pt_clear(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
struct ttm_operation_ctx ctx = { true, false };
struct amdgpu_vm_update_params params;
struct amdgpu_bo *ancestor = >bo;
-   unsigned int entries, ats_entries;
+   unsigned int entries;
struct amdgpu_bo *bo = >bo;
uint64_

[PATCH V3] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven

2024-02-28 Thread Jesse . Zhang
fix the issue:
"amdgpu: Failed to create process VM object".

[Why]when amdgpu initialized, seq64 do mampping and update bo mapping in vm 
page table.
But when clifo run. It also initializes a vm for a process device through the 
function kfd_process_device_init_vm and ensure the root PD is clean through the 
function amdgpu_vm_pt_is_root_clean.
So they have a conflict, and clinfo  always failed.

 v1:
- remove all the pte_supports_ats stuff from the amdgpu_vm code (Felix)

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 23 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  3 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 55 +--
 3 files changed, 1 insertion(+), 80 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ed4a8c5d26d7..d004ace79536 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1385,10 +1385,6 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev,
struct amdgpu_bo_va_mapping, list);
list_del(>list);
 
-   if (vm->pte_support_ats &&
-   mapping->start < AMDGPU_GMC_HOLE_START)
-   init_pte_value = AMDGPU_PTE_DEFAULT_ATC;
-
r = amdgpu_vm_update_range(adev, vm, false, false, true, false,
   resv, mapping->start, mapping->last,
   init_pte_value, 0, 0, NULL, NULL,
@@ -2264,7 +2260,6 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
if (r)
return r;
 
-   vm->pte_support_ats = false;
vm->is_compute_context = false;
 
vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
@@ -2350,30 +2345,12 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
  */
 int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 {
-   bool pte_support_ats = (adev->asic_type == CHIP_RAVEN);
int r;
 
r = amdgpu_bo_reserve(vm->root.bo, true);
if (r)
return r;
 
-   /* Check if PD needs to be reinitialized and do it before
-* changing any other state, in case it fails.
-*/
-   if (pte_support_ats != vm->pte_support_ats) {
-   /* Sanity checks */
-   if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
-   r = -EINVAL;
-   goto unreserve_bo;
-   }
-
-   vm->pte_support_ats = pte_support_ats;
-   r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo),
-  false);
-   if (r)
-   goto unreserve_bo;
-   }
-
/* Update VM state */
vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
AMDGPU_VM_USE_CPU_FOR_COMPUTE);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 42f6ddec50c1..9f6b5e1ccf34 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -357,9 +357,6 @@ struct amdgpu_vm {
/* Functions to use for VM table updates */
const struct amdgpu_vm_update_funcs *update_funcs;
 
-   /* Flag to indicate ATS support from PTE for GFX9 */
-   boolpte_support_ats;
-
/* Up to 128 pending retry page faults */
DECLARE_KFIFO(faults, u64, 128);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index a160265ddc07..f07255532aae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -89,22 +89,6 @@ static unsigned int amdgpu_vm_pt_num_entries(struct 
amdgpu_device *adev,
return AMDGPU_VM_PTE_COUNT(adev);
 }
 
-/**
- * amdgpu_vm_pt_num_ats_entries - return the number of ATS entries in the root 
PD
- *
- * @adev: amdgpu_device pointer
- *
- * Returns:
- * The number of entries in the root page directory which needs the ATS 
setting.
- */
-static unsigned int amdgpu_vm_pt_num_ats_entries(struct amdgpu_device *adev)
-{
-   unsigned int shift;
-
-   shift = amdgpu_vm_pt_level_shift(adev, adev->vm_manager.root_level);
-   return AMDGPU_GMC_HOLE_START >> (shift + AMDGPU_GPU_PAGE_SHIFT);
-}
-
 /**
  * amdgpu_vm_pt_entries_mask - the mask to get the entry number of a PD/PT
  *
@@ -394,27 +378,7 @@ int amdgpu_vm_pt_clear(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
}
 
entries = amdgpu_bo_size(bo) / 8;
-   if (!vm->pte_support_ats) {
-   ats_entries = 0;
-
-   } else if (!bo->parent) {
-   ats_entries = amdgpu_vm_pt_num_ats_entries(adev);
-   ats_e

[PATCH] Revert "drm/amdgpu: remove vm sanity check from amdgpu_vm_make_compute" for Raven

2024-02-27 Thread Jesse . Zhang
fix the issue when run clinfo:
"amdgpu: Failed to create process VM object".

when amdgpu initialized, seq64 do mampping and update bo mapping in vm page 
table.
But when clifo run. It also initializes a vm for a process device through the 
function kfd_process_device_init_vm
and ensure the root PD is clean through the function amdgpu_vm_pt_is_root_clean.
So they have a conflict, and clinfo  always failed.

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index ed4a8c5d26d7..0bc0bc75be15 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2361,12 +2361,6 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, 
struct amdgpu_vm *vm)
 * changing any other state, in case it fails.
 */
if (pte_support_ats != vm->pte_support_ats) {
-   /* Sanity checks */
-   if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
-   r = -EINVAL;
-   goto unreserve_bo;
-   }
-
vm->pte_support_ats = pte_support_ats;
r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo),
   false);
-- 
2.34.1



[PATCH] drm/amdgpu: Fix NULL pointer issue

2023-10-26 Thread Jesse Zhang
Add check for ras pointers.
Issues caused by this commit: be5c7eb104067d61

[ 2312.987618] BUG: kernel NULL pointer dereference, address: 00e8
[ 2312.987622] #PF: supervisor read access in kernel mode
[ 2312.987624] #PF: error_code(0x) - not-present page
[ 2312.987625] PGD 0 P4D 0
[ 2312.987627] Oops:  [#1] PREEMPT SMP NOPTI
[ 2312.987630] CPU: 9 PID: 1749 Comm: modprobe Not tainted 6.3.7-38fc8aadcfb2 #1
[ 2312.987632] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS TLD1001Bb 
12/01/2020
[ 2312.987634] RIP: 0010:amdgpu_ras_reset_error_count+0x126/0x140 [amdgpu]
[ 2312.987852] Code: 10 48 c7 c1 ec 6a 54 c1 77 08 4a 8b 0c ed c0 35 59 c1 48 
8b 33 48 c7 c2 78 a7 4d c1 48 c7 c7 60 a4 5c c1 e8 8c 9e ca d0 eb bf <41> 8b 86 
e8 00 00 00 85 c0 0f 84 37 ff ff ff e9 26 ff ff ff 31 c0
[ 2312.987855] RSP: 0018:a40402e378e0 EFLAGS: 00010246
[ 2312.987856] RAX:  RBX: 90cf0958 RCX: 
[ 2312.987858] RDX:  RSI: 0006 RDI: 90cf0958
[ 2312.987859] RBP: a40402e37908 R08:  R09: c000fffe
[ 2312.987860] R10:  R11: a40402e37640 R12: c1593d80
[ 2312.987861] R13: 0006 R14:  R15: 
[ 2312.987862] FS:  7fb5d3b33c40() GS:90d00684() 
knlGS:
[ 2312.987864] CS:  0010 DS:  ES:  CR0: 80050033
[ 2312.987865] CR2: 00e8 CR3: 00010ae24000 CR4: 00750ee0
[ 2312.987867] PKRU: 5554
[ 2312.987868] Call Trace:
[ 2312.987870]  
[ 2312.987872]  ? show_regs+0x5b/0x70
[ 2312.987877]  ? __die_body+0x1f/0x70
[ 2312.987879]  ? __die+0x2a/0x40
[ 2312.987881]  ? page_fault_oops+0x156/0x470
[ 2312.987884]  ? dev_printk_emit+0x87/0xc0
[ 2312.987889]  ? do_user_addr_fault+0x34a/0x720
[ 2312.987891]  ? exc_page_fault+0x75/0x180
[ 2312.987895]  ? asm_exc_page_fault+0x27/0x30
[ 2312.987898]  ? amdgpu_ras_reset_error_count+0x126/0x140 [amdgpu]
[ 2312.987980]  gmc_v9_0_late_init+0x7f/0xc0 [amdgpu]
[ 2312.988064]  amdgpu_device_ip_late_init+0x49/0x2b0 [amdgpu]
[ 2312.988144]  ? mutex_lock+0x12/0x40
[ 2312.988148]  amdgpu_device_init+0x2253/0x24e0 [amdgpu]
[ 2312.988225]  ? pci_read_config_word+0x23/0x40
[ 2312.988230]  amdgpu_driver_load_kms+0x1a/0x1a0 [amdgpu]
[ 2312.988278]  amdgpu_pci_probe+0x16b/0x4a0 [amdgpu]
[ 2312.988278]  local_pci_probe+0x4a/0xb0
[ 2312.988278]  pci_device_probe+0xd9/0x240
[ 2312.988278]  really_probe+0x116/0x3e0
[ 2312.988278]  ? pm_runtime_barrier+0x55/0xa0
[ 2312.988278]  __driver_probe_device+0x81/0x160
[ 2312.988278]  driver_probe_device+0x24/0xb0
[ 2312.988278]  __driver_attach+0x10e/0x170
[ 2312.988278]  ? __device_attach_driver+0x120/0x120
[ 2312.988278]  bus_for_each_dev+0x7b/0xd0
[ 2312.988278]  driver_attach+0x1e/0x30
[ 2312.988278]  bus_add_driver+0x11d/0x220
[ 2312.988278]  ? 0xc0b56000
[ 2312.988278]  driver_register+0x5e/0x120
[ 2312.988278]  ? 0xc0b56000
[ 2312.988278]  __pci_register_driver+0x68/0x70
[ 2312.988278]  amdgpu_init+0x74/0x1000 [amdgpu]
[ 2312.988278]  do_one_initcall+0x48/0x210
[ 2312.988278]  ? kmalloc_trace+0x2a/0xa0
[ 2312.988278]  do_init_module+0x4f/0x1f3
[ 2312.988278]  load_module+0x21fe/0x23f0
[ 2312.988278]  ? kernel_read_file+0x291/0x310
[ 2312.988278]  __do_sys_finit_module+0xc0/0x130
[ 2312.988278]  ? __do_sys_finit_module+0xc0/0x130
[ 2312.988278]  __x64_sys_finit_module+0x1a/0x20
[ 2312.988278]  do_syscall_64+0x3a/0x90
[ 2312.988278]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 303fbb6a48b6..33801a5bb460 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1223,7 +1223,7 @@ int amdgpu_ras_reset_error_count(struct amdgpu_device 
*adev,
struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
const struct amdgpu_mca_smu_funcs *mca_funcs = adev->mca.mca_funcs;
 
-   if (!block_obj || !block_obj->hw_ops) {
+   if (!block_obj || !block_obj->hw_ops || !ras) {
dev_dbg_once(adev->dev, "%s doesn't config RAS function\n",
ras_block_str(block));
return -EOPNOTSUPP;
-- 
2.25.1



[PATCH] drm/amdkfd: Fix shift out-of-bounds issue and remove unused code.

2023-10-18 Thread Jesse Zhang
[  567.613292] shift exponent 255 is too large for 64-bit type 'long unsigned 
int'
[  567.614498] CPU: 5 PID: 238 Comm: kworker/5:1 Tainted: G   OE  
6.2.0-34-generic #34~22.04.1-Ubuntu
[  567.614502] Hardware name: AMD Splinter/Splinter-RPL, BIOS WS43927N_871 
09/25/2023
[  567.614504] Workqueue: events send_exception_work_handler [amdgpu]
[  567.614748] Call Trace:
[  567.614750]  
[  567.614753]  dump_stack_lvl+0x48/0x70
[  567.614761]  dump_stack+0x10/0x20
[  567.614763]  __ubsan_handle_shift_out_of_bounds+0x156/0x310
[  567.614769]  ? srso_alias_return_thunk+0x5/0x7f
[  567.614773]  ? update_sd_lb_stats.constprop.0+0xf2/0x3c0
[  567.614780]  svm_range_split_by_granularity.cold+0x2b/0x34 [amdgpu]
[  567.615047]  ? srso_alias_return_thunk+0x5/0x7f
[  567.615052]  svm_migrate_to_ram+0x185/0x4d0 [amdgpu]
[  567.615286]  do_swap_page+0x7b6/0xa30
[  567.615291]  ? srso_alias_return_thunk+0x5/0x7f
[  567.615294]  ? __free_pages+0x119/0x130
[  567.615299]  handle_pte_fault+0x227/0x280
[  567.615303]  __handle_mm_fault+0x3c0/0x720
[  567.615311]  handle_mm_fault+0x119/0x330
[  567.615314]  ? lock_mm_and_find_vma+0x44/0x250
[  567.615318]  do_user_addr_fault+0x1a9/0x640
[  567.615323]  exc_page_fault+0x81/0x1b0
[  567.615328]  asm_exc_page_fault+0x27/0x30
[  567.615332] RIP: 0010:__get_user_8+0x1c/0x30

Suggested-by: Philip Yang 
Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 62 +---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h |  3 --
 2 files changed, 1 insertion(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 54af7a2b29f8..ccaf86a4c02a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -781,7 +781,7 @@ svm_range_apply_attrs(struct kfd_process *p, struct 
svm_range *prange,
prange->flags &= ~attrs[i].value;
break;
case KFD_IOCTL_SVM_ATTR_GRANULARITY:
-   prange->granularity = attrs[i].value;
+   prange->granularity = attrs[i].value & 0x3F;
break;
default:
WARN_ONCE(1, "svm_range_check_attrs wasn't called?");
@@ -1139,66 +1139,6 @@ svm_range_add_child(struct svm_range *prange, struct 
mm_struct *mm,
list_add_tail(>child_list, >child_list);
 }
 
-/**
- * svm_range_split_by_granularity - collect ranges within granularity boundary
- *
- * @p: the process with svms list
- * @mm: mm structure
- * @addr: the vm fault address in pages, to split the prange
- * @parent: parent range if prange is from child list
- * @prange: prange to split
- *
- * Trims @prange to be a single aligned block of prange->granularity if
- * possible. The head and tail are added to the child_list in @parent.
- *
- * Context: caller must hold mmap_read_lock and prange->lock
- *
- * Return:
- * 0 - OK, otherwise error code
- */
-int
-svm_range_split_by_granularity(struct kfd_process *p, struct mm_struct *mm,
-  unsigned long addr, struct svm_range *parent,
-  struct svm_range *prange)
-{
-   struct svm_range *head, *tail;
-   unsigned long start, last, size;
-   int r;
-
-   /* Align splited range start and size to granularity size, then a single
-* PTE will be used for whole range, this reduces the number of PTE
-* updated and the L1 TLB space used for translation.
-*/
-   size = 1UL << prange->granularity;
-   start = ALIGN_DOWN(addr, size);
-   last = ALIGN(addr + 1, size) - 1;
-
-   pr_debug("svms 0x%p split [0x%lx 0x%lx] to [0x%lx 0x%lx] size 0x%lx\n",
-prange->svms, prange->start, prange->last, start, last, size);
-
-   if (start > prange->start) {
-   r = svm_range_split(prange, start, prange->last, );
-   if (r)
-   return r;
-   svm_range_add_child(parent, mm, head, SVM_OP_ADD_RANGE);
-   }
-
-   if (last < prange->last) {
-   r = svm_range_split(prange, prange->start, last, );
-   if (r)
-   return r;
-   svm_range_add_child(parent, mm, tail, SVM_OP_ADD_RANGE);
-   }
-
-   /* xnack on, update mapping on GPUs with ACCESS_IN_PLACE */
-   if (p->xnack_enabled && prange->work_item.op == SVM_OP_ADD_RANGE) {
-   prange->work_item.op = SVM_OP_ADD_RANGE_AND_MAP;
-   pr_debug("change prange 0x%p [0x%lx 0x%lx] op %d\n",
-prange, prange->start, prange->last,
-SVM_OP_ADD_RANGE_AND_MAP);
-   }
-   return 0;
-}
 static bool
 svm_nodes_in_same_hive(struct kfd_node *node_a, struct kfd_node *node_b)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h 
b/d

[PATCH] drm/amdkfd: Fix shift out-of-bounds issue

2023-10-18 Thread Jesse Zhang
[  567.613292] shift exponent 255 is too large for 64-bit type 'long unsigned 
int'
[  567.614498] CPU: 5 PID: 238 Comm: kworker/5:1 Tainted: G   OE  
6.2.0-34-generic #34~22.04.1-Ubuntu
[  567.614502] Hardware name: AMD Splinter/Splinter-RPL, BIOS WS43927N_871 
09/25/2023
[  567.614504] Workqueue: events send_exception_work_handler [amdgpu]
[  567.614748] Call Trace:
[  567.614750]  
[  567.614753]  dump_stack_lvl+0x48/0x70
[  567.614761]  dump_stack+0x10/0x20
[  567.614763]  __ubsan_handle_shift_out_of_bounds+0x156/0x310
[  567.614769]  ? srso_alias_return_thunk+0x5/0x7f
[  567.614773]  ? update_sd_lb_stats.constprop.0+0xf2/0x3c0
[  567.614780]  svm_range_split_by_granularity.cold+0x2b/0x34 [amdgpu]
[  567.615047]  ? srso_alias_return_thunk+0x5/0x7f
[  567.615052]  svm_migrate_to_ram+0x185/0x4d0 [amdgpu]
[  567.615286]  do_swap_page+0x7b6/0xa30
[  567.615291]  ? srso_alias_return_thunk+0x5/0x7f
[  567.615294]  ? __free_pages+0x119/0x130
[  567.615299]  handle_pte_fault+0x227/0x280
[  567.615303]  __handle_mm_fault+0x3c0/0x720
[  567.615311]  handle_mm_fault+0x119/0x330
[  567.615314]  ? lock_mm_and_find_vma+0x44/0x250
[  567.615318]  do_user_addr_fault+0x1a9/0x640
[  567.615323]  exc_page_fault+0x81/0x1b0
[  567.615328]  asm_exc_page_fault+0x27/0x30
[  567.615332] RIP: 0010:__get_user_8+0x1c/0x30

Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 7b81233bc9ae..f5e0bccc6d71 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1169,7 +1169,7 @@ svm_range_split_by_granularity(struct kfd_process *p, 
struct mm_struct *mm,
 * PTE will be used for whole range, this reduces the number of PTE
 * updated and the L1 TLB space used for translation.
 */
-   size = 1UL << prange->granularity;
+   size = 1UL << (prange->granularity & 0x3f);
start = ALIGN_DOWN(addr, size);
last = ALIGN(addr + 1, size) - 1;
 
-- 
2.25.1



  1   2   >