Re: [PATCH 2/2] drm:amdgpu: add firmware information of all IP's

2024-03-13 Thread Khatri, Sunil



On 3/14/2024 2:06 AM, Alex Deucher wrote:

On Tue, Mar 12, 2024 at 8:42 AM Sunil Khatri  wrote:

Add firmware version information of each
IP and each instance where applicable.


Is there a way we can share some common code with devcoredump,
debugfs, and the info IOCTL?  All three places need to query this
information and the same logic is repeated in each case.


Hello Alex,

Yes you re absolutely right the same information is being retrieved 
again as done in debugfs. I can reorganize the code so same function 
could be used by debugfs and devcoredump but this is exactly what i 
tried to avoid here. I did try to use minimum functionality in 
devcoredump without shuffling a lot of code here and there.


Also our devcoredump is implemented in amdgpu_reset.c and not all the 
information is available here and there we might have to include lot of 
header and cross functions in amdgpu_reset until we want a dedicated 
file for devcoredump.


Info IOCTL does have a lot of information which also is in pipeline to 
be dumped but this if we want to reuse the functionality of IOCTL we 
need to reorganize a lot of code.


If that is the need of the hour i could work on that. Please let me know.

This is my suggestion if it makes sense:

1. If we want to reuse a lot of functionality then we need to modularize 
some of the functions further so they could be consumed directly by 
devcoredump.
2. We should also have a dedicated file for devcoredump.c/.h so its easy 
to include headers of needed functionality cleanly and easy to expand 
devcoredump.
3. based on the priority and importance of this task we can add 
information else some repetition is a real possibility.




Alex



Signed-off-by: Sunil Khatri 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 122 ++
  1 file changed, 122 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 611fdb90a1fc..78ddc58aef67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -168,6 +168,123 @@ void amdgpu_coredump(struct amdgpu_device *adev, bool 
vram_lost,
  {
  }
  #else
+static void amdgpu_devcoredump_fw_info(struct amdgpu_device *adev, struct 
drm_printer *p)
+{
+   uint32_t version;
+   uint32_t feature;
+   uint8_t smu_program, smu_major, smu_minor, smu_debug;
+
+   drm_printf(p, "VCE feature version: %u, fw version: 0x%08x\n",
+  adev->vce.fb_version, adev->vce.fw_version);
+   drm_printf(p, "UVD feature version: %u, fw version: 0x%08x\n",
+  0, adev->uvd.fw_version);
+   drm_printf(p, "GMC feature version: %u, fw version: 0x%08x\n",
+  0, adev->gmc.fw_version);
+   drm_printf(p, "ME feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.me_feature_version, adev->gfx.me_fw_version);
+   drm_printf(p, "PFP feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.pfp_feature_version, adev->gfx.pfp_fw_version);
+   drm_printf(p, "CE feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.ce_feature_version, adev->gfx.ce_fw_version);
+   drm_printf(p, "RLC feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlc_feature_version, adev->gfx.rlc_fw_version);
+
+   drm_printf(p, "RLC SRLC feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlc_srlc_feature_version,
+  adev->gfx.rlc_srlc_fw_version);
+   drm_printf(p, "RLC SRLG feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlc_srlg_feature_version,
+  adev->gfx.rlc_srlg_fw_version);
+   drm_printf(p, "RLC SRLS feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlc_srls_feature_version,
+  adev->gfx.rlc_srls_fw_version);
+   drm_printf(p, "RLCP feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlcp_ucode_feature_version,
+  adev->gfx.rlcp_ucode_version);
+   drm_printf(p, "RLCV feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlcv_ucode_feature_version,
+  adev->gfx.rlcv_ucode_version);
+   drm_printf(p, "MEC feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.mec_feature_version,
+  adev->gfx.mec_fw_version);
+
+   if (adev->gfx.mec2_fw)
+   drm_printf(p,
+  "MEC2 feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.mec2_feature_version,
+  adev->gfx.mec2_fw_version);
+
+   drm_printf(p, "IMU feature version: %u, fw version: 0x%08x\n",
+  0, adev->gfx.imu_fw_version);
+   drm_printf(p, "PSP SOS feature version: %u, fw version: 0x%08x\n",
+  adev->psp.sos.feature_version,
+  adev->psp.sos.fw_version);
+   drm_printf(p, "PSP ASD 

Re: [PATCH] Documentation: add a page on amdgpu debugging

2024-03-13 Thread Friedrich Vock

On 13.03.24 22:01, Alex Deucher wrote:

Covers GPU page fault debugging and adds a reference
to umr.

Signed-off-by: Alex Deucher
---
  Documentation/gpu/amdgpu/debugging.rst | 79 ++
  Documentation/gpu/amdgpu/index.rst |  1 +
  2 files changed, 80 insertions(+)
  create mode 100644 Documentation/gpu/amdgpu/debugging.rst

diff --git a/Documentation/gpu/amdgpu/debugging.rst 
b/Documentation/gpu/amdgpu/debugging.rst
new file mode 100644
index ..29971a7a6815
--- /dev/null
+++ b/Documentation/gpu/amdgpu/debugging.rst
@@ -0,0 +1,79 @@
+===
+ GPU Debugging
+===
+
+GPUVM Debugging
+===
+
+To aid in debugging GPU virtual memory related problems, the driver supports a
+number of options module paramters:
+
+`vm_fault_stop` - If non-0, halt the GPU memory controller on a GPU page fault.
+
+`vm_update_mode` - If non-0, use the CPU to update GPU page tables rather than
+the GPU.
+
+
+Decoding a GPUVM Page Fault
+===
+
+If you see a GPU page fault in the kernel log, you can decode it to figure
+out what is going wrong in your application.  A page fault in your kernel
+log may look something like this:
+
+::
+
+ [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32777, for 
process glxinfo pid 2424 thread glxinfo:cs0 pid 2425)
+   in page starting at address 0x80010280 from IH client 0x1b (UTCL2)
+ VM_L2_PROTECTION_FAULT_STATUS:0x00301030
+   Faulty UTCL2 client ID: TCP (0x8)
+   MORE_FAULTS: 0x0
+   WALKER_ERROR: 0x0
+   PERMISSION_FAULTS: 0x3
+   MAPPING_ERROR: 0x0
+   RW: 0x0
+
+First you have the memory hub, gfxhub and mmhub.  gfxhub is the memory
+hub used for graphics, compute, and sdma on some chips.  mmhub is the
+memory hub used for multi-media and sdma on some chips.
+
+Next you have the vmid and pasid.  If the vmid is 0, this fault was likely
+caused by the kernel driver or firmware.  If the vmid is non-0, it is generally
+a fault in a user application.  The pasid is used to link a vmid to a system
+process id.  If the process is active when the fault happens, the process
+information will be printed.
+
+The GPU virtual address that caused the fault comes next.
+
+The client ID indicates the GPU block that caused the fault.
+Some common client IDs:
+
+- CB/DB: The color/depth backend of the graphics pipe
+- CPF: Command Processor Frontend
+- CPC: Command Processor Compute
+- CPG: Command Processor Graphics
+- TCP: Shaders


For shader accesses, maybe including SQC (data)/SQC (inst) for SMEM
accesses/instruction prefetching would be useful?

Thanks,
Friedrich


+- SDMA: SDMA engines
+- VCN: Video encode/decode engines
+- JPEG: JPEG engines
+
+PERMISSION_FAULTS describe what faults were encountered:
+
+- bit 0: the PTE was not valid
+- bit 1: the PTE read bit was not set
+- bit 2: the PTE write bit was not set
+- bit 3: the PTE execute bit was not set
+
+Finally, RW, indicates whether the access was a read (0) or a write (1).
+
+In the example above, a shader (cliend id = TCP) generated a read (RW = 0x0) to
+an invalid page (PERMISSION_FAULTS = 0x3) at GPU virtual address
+0x80010280.  The user can then inspect can then inspect their shader
+code and resource descriptor state to determine what caused the GPU page fault.
+
+UMR
+===
+
+`umr`_ is a general purpose
+GPU debugging and diagnostics tool.  Please see the umr documentation for
+more information about its capabilities.
diff --git a/Documentation/gpu/amdgpu/index.rst 
b/Documentation/gpu/amdgpu/index.rst
index 912e699fd373..847e04924030 100644
--- a/Documentation/gpu/amdgpu/index.rst
+++ b/Documentation/gpu/amdgpu/index.rst
@@ -15,4 +15,5 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) 
architectures.
 ras
 thermal
 driver-misc
+   debugging
 amdgpu-glossary


Re: [PATCH 1/2] drm/amdgpu: add the IP information of the soc

2024-03-13 Thread Khatri, Sunil



On 3/14/2024 1:58 AM, Alex Deucher wrote:

On Tue, Mar 12, 2024 at 8:41 AM Sunil Khatri  wrote:

Add all the IP's information on a SOC to the
devcoredump.

Signed-off-by: Sunil Khatri 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 19 +++
  1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index a0dbccad2f53..611fdb90a1fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -196,6 +196,25 @@ amdgpu_devcoredump_read(char *buffer, loff_t offset, 
size_t count,
coredump->reset_task_info.process_name,
coredump->reset_task_info.pid);

+   /* GPU IP's information of the SOC */
+   if (coredump->adev) {
+   drm_printf(, "\nIP Information\n");
+   drm_printf(, "SOC Family: %d\n", coredump->adev->family);
+   drm_printf(, "SOC Revision id: %d\n", coredump->adev->rev_id);
+
+   for (int i = 0; i < coredump->adev->num_ip_blocks; i++) {
+   struct amdgpu_ip_block *ip =
+   >adev->ip_blocks[i];
+   drm_printf(, "IP type: %d IP name: %s\n",
+  ip->version->type,
+  ip->version->funcs->name);
+   drm_printf(, "IP version: (%d,%d,%d)\n\n",
+  ip->version->major,
+  ip->version->minor,
+  ip->version->rev);
+   }
+   }

I think the IP discovery table would be more useful.  Either walk the
adev->ip_versions structure, or just include the IP discovery binary.


I did explore the adev->ip_versions and if i just go through the array 
it doesn't give any useful information directly.
There are no ways to find directly from adev->ip_versions below things 
until i also reparse the discovery binary again like done the discovery 
amdgpu_discovery_reg_base_init and walk through the headers of various 
ips using discovery binary.

a. Which IP is available on soc or not.
b. How many instances are there
Also i again have to change back to major, minor and rev convention for 
this information to be useful. I am exploring it more if i find some 
other information i will update.


adev->ip_block[] is derived from ip discovery only for each block which 
is there on the SOC, so we are not reading information which isnt 
applicable for the soc. We have name , type and version no of the IPs 
available on the soc. If you want i could add no of instances of each IP 
too if you think that's useful information here. Could you share what 
information is missing in this approach so i can include that.


For dumping the IP discovery binary, i dont understand how that 
information would be useful directly and needs to be decoded like we are 
doing in discovery init. Please correct me if my understanding is wrong 
here.

Alex


+
 if (coredump->ring) {
 drm_printf(, "\nRing timed out details\n");
 drm_printf(, "IP Type: %d Ring Name: %s\n",
--
2.34.1



Re: [PATCH 1/9] drm/amd/pm: Add support for DPM policies

2024-03-13 Thread Lazar, Lijo
This one is missing some NULL checks. Will send a v2.

Thanks,
Lijo

On 3/13/2024 4:32 PM, Lijo Lazar wrote:
> Add support to set/get information about different DPM policies. The
> support is only available on SOCs which use swsmu architecture.
> 
> A DPM policy type may be defined with different levels. For example, a
> policy may be defined to select Pstate preference and then later a
> pstate preference may be chosen.
> 
> Signed-off-by: Lijo Lazar 
> Reviewed-by: Hawking Zhang 
> ---
>  .../gpu/drm/amd/include/kgd_pp_interface.h| 16 
>  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 29 ++
>  drivers/gpu/drm/amd/pm/amdgpu_pm.c| 92 ++
>  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  4 +
>  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 95 +++
>  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 29 ++
>  6 files changed, 265 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
> b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> index 32054ecf0b87..e48da7acd7a7 100644
> --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> @@ -272,6 +272,22 @@ enum pp_xgmi_plpd_mode {
>   XGMI_PLPD_COUNT,
>  };
>  
> +enum pp_pm_policy {
> + PP_PM_POLICY_NONE = -1,
> + PP_PM_POLICY_SOC_PSTATE = 0,
> + PP_PM_POLICY_NUM,
> +};
> +
> +enum pp_policy_soc_pstate {
> + SOC_PSTATE_DEFAULT = 0,
> + SOC_PSTATE_0,
> + SOC_PSTATE_1,
> + SOC_PSTATE_2,
> + SOC_PSTAT_COUNT,
> +};
> +
> +#define PP_POLICY_MAX_LEVELS 5
> +
>  #define PP_GROUP_MASK0xF000
>  #define PP_GROUP_SHIFT   28
>  
> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
> b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> index f84bfed50681..db3addd07120 100644
> --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> @@ -411,6 +411,35 @@ int amdgpu_dpm_set_xgmi_plpd_mode(struct amdgpu_device 
> *adev, int mode)
>   return ret;
>  }
>  
> +ssize_t amdgpu_dpm_get_pm_policy_info(struct amdgpu_device *adev, char *buf)
> +{
> + struct smu_context *smu = adev->powerplay.pp_handle;
> + int ret = -EOPNOTSUPP;
> +
> + if (is_support_sw_smu(adev)) {
> + mutex_lock(>pm.mutex);
> + ret = smu_get_pm_policy_info(smu, buf);
> + mutex_unlock(>pm.mutex);
> + }
> +
> + return ret;
> +}
> +
> +int amdgpu_dpm_set_pm_policy(struct amdgpu_device *adev, int policy_type,
> +  int policy_level)
> +{
> + struct smu_context *smu = adev->powerplay.pp_handle;
> + int ret = -EOPNOTSUPP;
> +
> + if (is_support_sw_smu(adev)) {
> + mutex_lock(>pm.mutex);
> + ret = smu_set_pm_policy(smu, policy_type, policy_level);
> + mutex_unlock(>pm.mutex);
> + }
> +
> + return ret;
> +}
> +
>  int amdgpu_dpm_enable_mgpu_fan_boost(struct amdgpu_device *adev)
>  {
>   void *pp_handle = adev->powerplay.pp_handle;
> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> index f09b9d49297e..d8c8eaff3355 100644
> --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> @@ -2155,6 +2155,96 @@ static ssize_t amdgpu_set_xgmi_plpd_policy(struct 
> device *dev,
>   return count;
>  }
>  
> +static ssize_t amdgpu_get_pm_policy(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct drm_device *ddev = dev_get_drvdata(dev);
> + struct amdgpu_device *adev = drm_to_adev(ddev);
> +
> + if (amdgpu_in_reset(adev))
> + return -EPERM;
> + if (adev->in_suspend && !adev->in_runpm)
> + return -EPERM;
> +
> + return amdgpu_dpm_get_pm_policy_info(adev, buf);
> +}
> +
> +static ssize_t amdgpu_set_pm_policy(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct drm_device *ddev = dev_get_drvdata(dev);
> + struct amdgpu_device *adev = drm_to_adev(ddev);
> + int policy_type, ret, num_params = 0;
> + char delimiter[] = " \n\t";
> + char tmp_buf[128];
> + char *tmp, *param;
> + long val;
> +
> + if (amdgpu_in_reset(adev))
> + return -EPERM;
> + if (adev->in_suspend && !adev->in_runpm)
> + return -EPERM;
> +
> + count = min(count, sizeof(tmp_buf));
> + memcpy(tmp_buf, buf, count);
> + tmp_buf[count - 1] = '\0';
> + tmp = tmp_buf;
> +
> + tmp = skip_spaces(tmp);
> + if (strncmp(tmp, "soc_pstate", strlen("soc_pstate")) == 0) {
> + policy_type = PP_PM_POLICY_SOC_PSTATE;
> + tmp += strlen("soc_pstate");
> + } else {
> + return -EINVAL;
> + }
> +
> + tmp = skip_spaces(tmp);
> + while ((param = strsep(, delimiter))) {
> + if (!strlen(param)) {
> + 

Re: [PATCH] drm/amdgpu: Do a basic health check before reset

2024-03-13 Thread Lazar, Lijo



On 3/14/2024 1:19 AM, Felix Kuehling wrote:
> 
> On 2024-03-13 5:41, Lijo Lazar wrote:
>> Check if the device is present in the bus before trying to recover. It
>> could be that device itself is lost from the bus in some hang
>> situations.
>>
>> Signed-off-by: Lijo Lazar 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++
>>   1 file changed, 24 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 1e9454e6e4cb..b37113b79483 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -5536,6 +5536,23 @@ static inline void
>> amdgpu_device_stop_pending_resets(struct amdgpu_device *adev)
>>     }
>>   +static int amdgpu_device_health_check(struct list_head
>> *device_list_handle)
>> +{
>> +    struct amdgpu_device *tmp_adev;
>> +    int ret = 0;
>> +    u32 status;
>> +
>> +    list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
>> +    pci_read_config_dword(tmp_adev->pdev, PCI_COMMAND, );
>> +    if (PCI_POSSIBLE_ERROR(status)) {
>> +    dev_err(tmp_adev->dev, "device lost from bus!");
>> +    ret = -ENODEV;
> 
> You could just return here. What's the point of looking for other
> devices if you're going to return an error anyway?
> 

This for XGMI case; the error is primarily for informational purpose to
know which all devices in the hive got into a bad state.

Thanks,
Lijo

> Regards,
>   Felix
> 
> 
>> +    }
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>>   /**
>>    * amdgpu_device_gpu_recover - reset the asic and recover scheduler
>>    *
>> @@ -5607,6 +5624,12 @@ int amdgpu_device_gpu_recover(struct
>> amdgpu_device *adev,
>>   device_list_handle = _list;
>>   }
>>   +    if (!amdgpu_sriov_vf(adev)) {
>> +    r = amdgpu_device_health_check(device_list_handle);
>> +    if (r)
>> +    goto end_reset;
>> +    }
>> +
>>   /* We need to lock reset domain only once both for XGMI and
>> single device */
>>   tmp_adev = list_first_entry(device_list_handle, struct
>> amdgpu_device,
>>   reset_list);
>> @@ -5772,6 +5795,7 @@ int amdgpu_device_gpu_recover(struct
>> amdgpu_device *adev,
>>   reset_list);
>>   amdgpu_device_unlock_reset_domain(tmp_adev->reset_domain);
>>   +end_reset:
>>   if (hive) {
>>   mutex_unlock(>hive_lock);
>>   amdgpu_put_xgmi_hive(hive);


RE: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series

2024-03-13 Thread Liang, Prike
[AMD Official Use Only - General]

> From: Alex Deucher 
> Sent: Thursday, March 14, 2024 4:46 AM
> To: Kuehling, Felix 
> Cc: Sasha Levin ; linux-ker...@vger.kernel.org;
> sta...@vger.kernel.org; Liang, Prike ; Deucher,
> Alexander ; Koenig, Christian
> ; Pan, Xinhui ;
> airl...@gmail.com; dan...@ffwll.ch; Zhang, Hawking
> ; Lazar, Lijo ; Ma, Le
> ; Zhu, James ; Xiao, Shane
> ; Jiang, Sonny ; amd-
> g...@lists.freedesktop.org; dri-de...@lists.freedesktop.org
> Subject: Re: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3
> abort cases on Raven series
>
> On Wed, Mar 13, 2024 at 4:12 PM Felix Kuehling 
> wrote:
> >
> > On 2024-03-11 11:14, Sasha Levin wrote:
> > > From: Prike Liang 
> > >
> > > [ Upstream commit c671ec01311b4744b377f98b0b4c6d033fe569b3 ]
> > >
> > > Currently, GPU resets can now be performed successfully on the Raven
> > > series. While GPU reset is required for the S3 suspend abort case.
> > > So now can enable gpu reset for S3 abort cases on the Raven series.
> >
> > This looks suspicious to me. I'm not sure what conditions made the GPU
> > reset successful. But unless all the changes involved were also
> > backported, this should probably not be applied to older kernel
> > branches. I'm speculating it may be related to the removal of AMD
> IOMMUv2.
> >
>
> We should get confirmation from Prike, but I think he tested this on older
> kernels as well.
>
> Alex
>
> > Regards,
> >Felix
> >

The Raven/Raven2 series GPU reset function was enabled in some older kernel 
versions such as 5.5 but filtered out in more recent kernel driver versions. 
Therefore, this patch only applies to the latest kernel version, and it should 
be safe without affecting other cases by enabling the Raven GPU reset only on 
the S3 suspend abort case. From the Chrome kernel log indicating that the AMD 
IOMMUv2 driver is loaded, and with this patch triggering the GPU reset before 
the AMDGPU device reinitialization, it can effectively handle the S3 suspend 
abort resume problem on the Raven series.

Was the Raven GPU reset previously disabled due to the AMD IOMMUv2 driver? If 
so, based on the Chromebook's verification result, the Raven series GPU reset 
can probably be enabled with IOMMUv2 for other cases as well.

Thanks,
Prike
> >
> > >
> > > Signed-off-by: Prike Liang 
> > > Acked-by: Alex Deucher 
> > > Signed-off-by: Alex Deucher 
> > > Signed-off-by: Sasha Levin 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/soc15.c | 45 +--
> ---
> > >   1 file changed, 25 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > index 6a3486f52d698..ef5b3eedc8615 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > > @@ -605,11 +605,34 @@ soc15_asic_reset_method(struct
> amdgpu_device *adev)
> > >   return AMD_RESET_METHOD_MODE1;
> > >   }
> > >
> > > +static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> > > +{
> > > + u32 sol_reg;
> > > +
> > > + sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > > +
> > > + /* Will reset for the following suspend abort cases.
> > > +  * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > > +  * 2) S3 suspend abort and TOS already launched.
> > > +  */
> > > + if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > > + !adev->suspend_complete &&
> > > + sol_reg)
> > > + return true;
> > > +
> > > + return false;
> > > +}
> > > +
> > >   static int soc15_asic_reset(struct amdgpu_device *adev)
> > >   {
> > >   /* original raven doesn't have full asic reset */
> > > - if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > > - (adev->apu_flags & AMD_APU_IS_RAVEN2))
> > > + /* On the latest Raven, the GPU reset can be performed
> > > +  * successfully. So now, temporarily enable it for the
> > > +  * S3 suspend abort case.
> > > +  */
> > > + if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > > + (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
> > > + !soc15_need_reset_on_resume(adev))
> > >   return 0;
> > >
> > >   switch (soc15_asic_reset_method(adev)) { @@ -1490,24 +1513,6
> > > @@ static int soc15_common_suspend(void *handle)
> > >   return soc15_common_hw_fini(adev);
> > >   }
> > >
> > > -static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> > > -{
> > > - u32 sol_reg;
> > > -
> > > - sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > > -
> > > - /* Will reset for the following suspend abort cases.
> > > -  * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > > -  * 2) S3 suspend abort and TOS already launched.
> > > -  */
> > > - if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > > - !adev->suspend_complete &&
> > > - sol_reg)
> > > -  

回覆: [PATCH] drm/amdgpu/vpe: power on vpe when hw_init

2024-03-13 Thread Lee, Peyton
[AMD Official Use Only - General]

Hi Alex,

There are two places where VPE will lose power: When there is a system call to 
vpe_hw_fini(), and the vpe-thread finds that VEP has no jobs in the queue.
This patch is to make sure that VPE is power up before loading firmware.

Thanks,
Peyton
-原始郵件-
寄件者: Alex Deucher 
寄件日期: Wednesday, March 13, 2024 9:17 PM
收件者: Lee, Peyton 
副本: amd-gfx@lists.freedesktop.org; Deucher, Alexander 
; Zhang, Yifan ; Ma, Li 
; Yu, Lang 
主旨: Re: [PATCH] drm/amdgpu/vpe: power on vpe when hw_init

On Wed, Mar 13, 2024 at 7:41 AM Peyton Lee  wrote:
>
> To fix mode2 reset failure.
> Should power on VPE when hw_init.

When does it get powered down again?  Won't this leave it powered up?

Alex

>
> Signed-off-by: Peyton Lee 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> index 70c5cc80ecdc..ecfe0f36e83e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> @@ -396,6 +396,12 @@ static int vpe_hw_init(void *handle)
> struct amdgpu_vpe *vpe = >vpe;
> int ret;
>
> +   /* Power on VPE */
> +   ret = amdgpu_device_ip_set_powergating_state(adev, 
> AMD_IP_BLOCK_TYPE_VPE,
> +AMD_PG_STATE_UNGATE);
> +   if (ret)
> +   return ret;
> +
> ret = vpe_load_microcode(vpe);
> if (ret)
> return ret;
> --
> 2.34.1
>


[PATCH] drm/sched: fix null-ptr-deref in init entity

2024-03-13 Thread vitaly.prosyak
From: Vitaly Prosyak 

The bug can be triggered by sending an amdgpu_cs_wait_ioctl
to the AMDGPU DRM driver on any ASICs with valid context.
The bug was reported by Joonkyo Jung .
For example the following code:

static void Syzkaller2(int fd)
{
union drm_amdgpu_ctx arg1;
union drm_amdgpu_wait_cs arg2;

arg1.in.op = AMDGPU_CTX_OP_ALLOC_CTX;
ret = drmIoctl(fd, 0x140106442 /* amdgpu_ctx_ioctl */, );

arg2.in.handle = 0x0;
arg2.in.timeout = 0x2;
arg2.in.ip_type = AMD_IP_VPE /* 0x9 */;
arg2->in.ip_instance = 0x0;
arg2.in.ring = 0x0;
arg2.in.ctx_id = arg1.out.alloc.ctx_id;

drmIoctl(fd, 0xc0206449 /* AMDGPU_WAIT_CS * /, );
}

The ioctl AMDGPU_WAIT_CS without previously submitted job could be assumed that
the error should be returned, but the following commit 
1decbf6bb0b4dc56c9da6c5e57b994ebfc2be3aa
modified the logic and allowed to have sched_rq equal to NULL.

As a result when there is no job the ioctl AMDGPU_WAIT_CS returns success.
The change fixes null-ptr-deref in init entity and the stack below demonstrates
the error condition:

[  +0.07] BUG: kernel NULL pointer dereference, address: 0028
[  +0.007086] #PF: supervisor read access in kernel mode
[  +0.005234] #PF: error_code(0x) - not-present page
[  +0.005232] PGD 0 P4D 0
[  +0.002501] Oops:  [#1] PREEMPT SMP KASAN NOPTI
[  +0.005034] CPU: 10 PID: 9229 Comm: amd_basic Tainted: GB   WL 
6.7.0+ #4
[  +0.007797] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING 
(WI-FI), BIOS 1401 12/03/2020
[  +0.009798] RIP: 0010:drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.006426] Code: 80 00 00 00 00 00 00 00 e8 1a 81 82 e0 49 89 9c 24 c0 00 00 
00 4c 89 ef e8 4a 80 82 e0 49 8b 5d 00 48 8d 7b 28 e8 3d 80 82 e0 <48> 83 7b 28 
00 0f 84 28 01 00 00 4d 8d ac 24 98 00 00 00 49 8d 5c
[  +0.019094] RSP: 0018:c90014c1fa40 EFLAGS: 00010282
[  +0.005237] RAX: 0001 RBX:  RCX: 8113f3fa
[  +0.007326] RDX: fbfff0a7889d RSI: 0008 RDI: 853c44e0
[  +0.007264] RBP: c90014c1fa80 R08: 0001 R09: fbfff0a7889c
[  +0.007266] R10: 853c44e7 R11: 0001 R12: 8881a719b010
[  +0.007263] R13: 88810d412748 R14: 0002 R15: 
[  +0.007264] FS:  77045540() GS:8883cc90() 
knlGS:
[  +0.008236] CS:  0010 DS:  ES:  CR0: 80050033
[  +0.005851] CR2: 0028 CR3: 00011912e000 CR4: 00350ef0
[  +0.007175] Call Trace:
[  +0.002561]  
[  +0.002141]  ? show_regs+0x6a/0x80
[  +0.003473]  ? __die+0x25/0x70
[  +0.003124]  ? page_fault_oops+0x214/0x720
[  +0.004179]  ? preempt_count_sub+0x18/0xc0
[  +0.004093]  ? __pfx_page_fault_oops+0x10/0x10
[  +0.004590]  ? srso_return_thunk+0x5/0x5f
[  +0.004000]  ? vprintk_default+0x1d/0x30
[  +0.004063]  ? srso_return_thunk+0x5/0x5f
[  +0.004087]  ? vprintk+0x5c/0x90
[  +0.003296]  ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.005807]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? _printk+0xb3/0xe0
[  +0.003293]  ? __pfx__printk+0x10/0x10
[  +0.003735]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  +0.005482]  ? do_user_addr_fault+0x345/0x770
[  +0.004361]  ? exc_page_fault+0x64/0xf0
[  +0.003972]  ? asm_exc_page_fault+0x27/0x30
[  +0.004271]  ? add_taint+0x2a/0xa0
[  +0.003476]  ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.005812]  amdgpu_ctx_get_entity+0x3f9/0x770 [amdgpu]
[  +0.009530]  ? finish_task_switch.isra.0+0x129/0x470
[  +0.005068]  ? __pfx_amdgpu_ctx_get_entity+0x10/0x10 [amdgpu]
[  +0.010063]  ? __kasan_check_write+0x14/0x20
[  +0.004356]  ? srso_return_thunk+0x5/0x5f
[  +0.004001]  ? mutex_unlock+0x81/0xd0
[  +0.003802]  ? srso_return_thunk+0x5/0x5f
[  +0.004096]  amdgpu_cs_wait_ioctl+0xf6/0x270 [amdgpu]
[  +0.009355]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009981]  ? srso_return_thunk+0x5/0x5f
[  +0.004089]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? __srcu_read_lock+0x20/0x50
[  +0.004096]  drm_ioctl_kernel+0x140/0x1f0 [drm]
[  +0.005080]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009974]  ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
[  +0.005618]  ? srso_return_thunk+0x5/0x5f
[  +0.004088]  ? __kasan_check_write+0x14/0x20
[  +0.004357]  drm_ioctl+0x3da/0x730 [drm]
[  +0.004461]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009979]  ? __pfx_drm_ioctl+0x10/0x10 [drm]
[  +0.004993]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? __kasan_check_write+0x14/0x20
[  +0.004356]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? _raw_spin_lock_irqsave+0x99/0x100
[  +0.004712]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  +0.005063]  ? __pfx_arch_do_signal_or_restart+0x10/0x10
[  +0.005477]  ? srso_return_thunk+0x5/0x5f
[  +0.004000]  ? preempt_count_sub+0x18/0xc0
[  +0.004237]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? 

[PATCH] drm/scheduler: fix null-ptr-deref in init entity

2024-03-13 Thread vitaly.prosyak
From: Vitaly Prosyak 

The bug can be triggered by sending an amdgpu_cs_wait_ioctl
to the AMDGPU DRM driver on any ASICs with valid context.
The bug was reported by Joonkyo Jung .
For example the following code:

static void Syzkaller2(int fd)
{
union drm_amdgpu_ctx arg1;
union drm_amdgpu_wait_cs arg2;

arg1.in.op = AMDGPU_CTX_OP_ALLOC_CTX;
ret = drmIoctl(fd, 0x140106442 /* amdgpu_ctx_ioctl */, );

arg2.in.handle = 0x0;
arg2.in.timeout = 0x2;
arg2.in.ip_type = AMD_IP_VPE /* 0x9 */;
arg2->in.ip_instance = 0x0;
arg2.in.ring = 0x0;
arg2.in.ctx_id = arg1.out.alloc.ctx_id;

drmIoctl(fd, 0xc0206449 /* AMDGPU_WAIT_CS * /, );
}

The ioctl AMDGPU_WAIT_CS without previously submitted job could be assumed that
the error should be returned, but the following commit 
1decbf6bb0b4dc56c9da6c5e57b994ebfc2be3aa
modified the logic and allowed to have sched_rq equal to NULL.

As a result when there is no job the ioctl AMDGPU_WAIT_CS returns success.
The change fixes null-ptr-deref in init entity and the stack below demonstrates
the error condition:

[  +0.07] BUG: kernel NULL pointer dereference, address: 0028
[  +0.007086] #PF: supervisor read access in kernel mode
[  +0.005234] #PF: error_code(0x) - not-present page
[  +0.005232] PGD 0 P4D 0
[  +0.002501] Oops:  [#1] PREEMPT SMP KASAN NOPTI
[  +0.005034] CPU: 10 PID: 9229 Comm: amd_basic Tainted: GB   WL 
6.7.0+ #4
[  +0.007797] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING 
(WI-FI), BIOS 1401 12/03/2020
[  +0.009798] RIP: 0010:drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.006426] Code: 80 00 00 00 00 00 00 00 e8 1a 81 82 e0 49 89 9c 24 c0 00 00 
00 4c 89 ef e8 4a 80 82 e0 49 8b 5d 00 48 8d 7b 28 e8 3d 80 82 e0 <48> 83 7b 28 
00 0f 84 28 01 00 00 4d 8d ac 24 98 00 00 00 49 8d 5c
[  +0.019094] RSP: 0018:c90014c1fa40 EFLAGS: 00010282
[  +0.005237] RAX: 0001 RBX:  RCX: 8113f3fa
[  +0.007326] RDX: fbfff0a7889d RSI: 0008 RDI: 853c44e0
[  +0.007264] RBP: c90014c1fa80 R08: 0001 R09: fbfff0a7889c
[  +0.007266] R10: 853c44e7 R11: 0001 R12: 8881a719b010
[  +0.007263] R13: 88810d412748 R14: 0002 R15: 
[  +0.007264] FS:  77045540() GS:8883cc90() 
knlGS:
[  +0.008236] CS:  0010 DS:  ES:  CR0: 80050033
[  +0.005851] CR2: 0028 CR3: 00011912e000 CR4: 00350ef0
[  +0.007175] Call Trace:
[  +0.002561]  
[  +0.002141]  ? show_regs+0x6a/0x80
[  +0.003473]  ? __die+0x25/0x70
[  +0.003124]  ? page_fault_oops+0x214/0x720
[  +0.004179]  ? preempt_count_sub+0x18/0xc0
[  +0.004093]  ? __pfx_page_fault_oops+0x10/0x10
[  +0.004590]  ? srso_return_thunk+0x5/0x5f
[  +0.004000]  ? vprintk_default+0x1d/0x30
[  +0.004063]  ? srso_return_thunk+0x5/0x5f
[  +0.004087]  ? vprintk+0x5c/0x90
[  +0.003296]  ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.005807]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? _printk+0xb3/0xe0
[  +0.003293]  ? __pfx__printk+0x10/0x10
[  +0.003735]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  +0.005482]  ? do_user_addr_fault+0x345/0x770
[  +0.004361]  ? exc_page_fault+0x64/0xf0
[  +0.003972]  ? asm_exc_page_fault+0x27/0x30
[  +0.004271]  ? add_taint+0x2a/0xa0
[  +0.003476]  ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.005812]  amdgpu_ctx_get_entity+0x3f9/0x770 [amdgpu]
[  +0.009530]  ? finish_task_switch.isra.0+0x129/0x470
[  +0.005068]  ? __pfx_amdgpu_ctx_get_entity+0x10/0x10 [amdgpu]
[  +0.010063]  ? __kasan_check_write+0x14/0x20
[  +0.004356]  ? srso_return_thunk+0x5/0x5f
[  +0.004001]  ? mutex_unlock+0x81/0xd0
[  +0.003802]  ? srso_return_thunk+0x5/0x5f
[  +0.004096]  amdgpu_cs_wait_ioctl+0xf6/0x270 [amdgpu]
[  +0.009355]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009981]  ? srso_return_thunk+0x5/0x5f
[  +0.004089]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? __srcu_read_lock+0x20/0x50
[  +0.004096]  drm_ioctl_kernel+0x140/0x1f0 [drm]
[  +0.005080]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009974]  ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
[  +0.005618]  ? srso_return_thunk+0x5/0x5f
[  +0.004088]  ? __kasan_check_write+0x14/0x20
[  +0.004357]  drm_ioctl+0x3da/0x730 [drm]
[  +0.004461]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009979]  ? __pfx_drm_ioctl+0x10/0x10 [drm]
[  +0.004993]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? __kasan_check_write+0x14/0x20
[  +0.004356]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? _raw_spin_lock_irqsave+0x99/0x100
[  +0.004712]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  +0.005063]  ? __pfx_arch_do_signal_or_restart+0x10/0x10
[  +0.005477]  ? srso_return_thunk+0x5/0x5f
[  +0.004000]  ? preempt_count_sub+0x18/0xc0
[  +0.004237]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? 

[PATCH] Documentation: add a page on amdgpu debugging

2024-03-13 Thread Alex Deucher
Covers GPU page fault debugging and adds a reference
to umr.

Signed-off-by: Alex Deucher 
---
 Documentation/gpu/amdgpu/debugging.rst | 79 ++
 Documentation/gpu/amdgpu/index.rst |  1 +
 2 files changed, 80 insertions(+)
 create mode 100644 Documentation/gpu/amdgpu/debugging.rst

diff --git a/Documentation/gpu/amdgpu/debugging.rst 
b/Documentation/gpu/amdgpu/debugging.rst
new file mode 100644
index ..29971a7a6815
--- /dev/null
+++ b/Documentation/gpu/amdgpu/debugging.rst
@@ -0,0 +1,79 @@
+===
+ GPU Debugging
+===
+
+GPUVM Debugging
+===
+
+To aid in debugging GPU virtual memory related problems, the driver supports a
+number of options module paramters:
+
+`vm_fault_stop` - If non-0, halt the GPU memory controller on a GPU page fault.
+
+`vm_update_mode` - If non-0, use the CPU to update GPU page tables rather than
+the GPU.
+
+
+Decoding a GPUVM Page Fault
+===
+
+If you see a GPU page fault in the kernel log, you can decode it to figure
+out what is going wrong in your application.  A page fault in your kernel
+log may look something like this:
+
+::
+
+ [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32777, for 
process glxinfo pid 2424 thread glxinfo:cs0 pid 2425)
+   in page starting at address 0x80010280 from IH client 0x1b (UTCL2)
+ VM_L2_PROTECTION_FAULT_STATUS:0x00301030
+   Faulty UTCL2 client ID: TCP (0x8)
+   MORE_FAULTS: 0x0
+   WALKER_ERROR: 0x0
+   PERMISSION_FAULTS: 0x3
+   MAPPING_ERROR: 0x0
+   RW: 0x0
+
+First you have the memory hub, gfxhub and mmhub.  gfxhub is the memory
+hub used for graphics, compute, and sdma on some chips.  mmhub is the
+memory hub used for multi-media and sdma on some chips.
+
+Next you have the vmid and pasid.  If the vmid is 0, this fault was likely
+caused by the kernel driver or firmware.  If the vmid is non-0, it is generally
+a fault in a user application.  The pasid is used to link a vmid to a system
+process id.  If the process is active when the fault happens, the process
+information will be printed.
+
+The GPU virtual address that caused the fault comes next.
+
+The client ID indicates the GPU block that caused the fault.
+Some common client IDs:
+
+- CB/DB: The color/depth backend of the graphics pipe
+- CPF: Command Processor Frontend
+- CPC: Command Processor Compute
+- CPG: Command Processor Graphics
+- TCP: Shaders
+- SDMA: SDMA engines
+- VCN: Video encode/decode engines
+- JPEG: JPEG engines
+
+PERMISSION_FAULTS describe what faults were encountered:
+
+- bit 0: the PTE was not valid
+- bit 1: the PTE read bit was not set
+- bit 2: the PTE write bit was not set
+- bit 3: the PTE execute bit was not set
+
+Finally, RW, indicates whether the access was a read (0) or a write (1).
+
+In the example above, a shader (cliend id = TCP) generated a read (RW = 0x0) to
+an invalid page (PERMISSION_FAULTS = 0x3) at GPU virtual address
+0x80010280.  The user can then inspect can then inspect their shader
+code and resource descriptor state to determine what caused the GPU page fault.
+
+UMR
+===
+
+`umr `_ is a general purpose
+GPU debugging and diagnostics tool.  Please see the umr documentation for
+more information about its capabilities.
diff --git a/Documentation/gpu/amdgpu/index.rst 
b/Documentation/gpu/amdgpu/index.rst
index 912e699fd373..847e04924030 100644
--- a/Documentation/gpu/amdgpu/index.rst
+++ b/Documentation/gpu/amdgpu/index.rst
@@ -15,4 +15,5 @@ Next (GCN), Radeon DNA (RDNA), and Compute DNA (CDNA) 
architectures.
ras
thermal
driver-misc
+   debugging
amdgpu-glossary
-- 
2.44.0



Re: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series

2024-03-13 Thread Alex Deucher
On Wed, Mar 13, 2024 at 4:12 PM Felix Kuehling  wrote:
>
> On 2024-03-11 11:14, Sasha Levin wrote:
> > From: Prike Liang 
> >
> > [ Upstream commit c671ec01311b4744b377f98b0b4c6d033fe569b3 ]
> >
> > Currently, GPU resets can now be performed successfully on the Raven
> > series. While GPU reset is required for the S3 suspend abort case.
> > So now can enable gpu reset for S3 abort cases on the Raven series.
>
> This looks suspicious to me. I'm not sure what conditions made the GPU
> reset successful. But unless all the changes involved were also
> backported, this should probably not be applied to older kernel
> branches. I'm speculating it may be related to the removal of AMD IOMMUv2.
>

We should get confirmation from Prike, but I think he tested this on
older kernels as well.

Alex

> Regards,
>Felix
>
>
> >
> > Signed-off-by: Prike Liang 
> > Acked-by: Alex Deucher 
> > Signed-off-by: Alex Deucher 
> > Signed-off-by: Sasha Levin 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/soc15.c | 45 +-
> >   1 file changed, 25 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 6a3486f52d698..ef5b3eedc8615 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -605,11 +605,34 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
> >   return AMD_RESET_METHOD_MODE1;
> >   }
> >
> > +static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> > +{
> > + u32 sol_reg;
> > +
> > + sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > +
> > + /* Will reset for the following suspend abort cases.
> > +  * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > +  * 2) S3 suspend abort and TOS already launched.
> > +  */
> > + if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > + !adev->suspend_complete &&
> > + sol_reg)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> >   static int soc15_asic_reset(struct amdgpu_device *adev)
> >   {
> >   /* original raven doesn't have full asic reset */
> > - if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > - (adev->apu_flags & AMD_APU_IS_RAVEN2))
> > + /* On the latest Raven, the GPU reset can be performed
> > +  * successfully. So now, temporarily enable it for the
> > +  * S3 suspend abort case.
> > +  */
> > + if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
> > + (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
> > + !soc15_need_reset_on_resume(adev))
> >   return 0;
> >
> >   switch (soc15_asic_reset_method(adev)) {
> > @@ -1490,24 +1513,6 @@ static int soc15_common_suspend(void *handle)
> >   return soc15_common_hw_fini(adev);
> >   }
> >
> > -static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
> > -{
> > - u32 sol_reg;
> > -
> > - sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > -
> > - /* Will reset for the following suspend abort cases.
> > -  * 1) Only reset limit on APU side, dGPU hasn't checked yet.
> > -  * 2) S3 suspend abort and TOS already launched.
> > -  */
> > - if (adev->flags & AMD_IS_APU && adev->in_s3 &&
> > - !adev->suspend_complete &&
> > - sol_reg)
> > - return true;
> > -
> > - return false;
> > -}
> > -
> >   static int soc15_common_resume(void *handle)
> >   {
> >   struct amdgpu_device *adev = (struct amdgpu_device *)handle;


Re: [PATCH 2/2] drm:amdgpu: add firmware information of all IP's

2024-03-13 Thread Alex Deucher
On Tue, Mar 12, 2024 at 8:42 AM Sunil Khatri  wrote:
>
> Add firmware version information of each
> IP and each instance where applicable.
>

Is there a way we can share some common code with devcoredump,
debugfs, and the info IOCTL?  All three places need to query this
information and the same logic is repeated in each case.

Alex


> Signed-off-by: Sunil Khatri 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 122 ++
>  1 file changed, 122 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> index 611fdb90a1fc..78ddc58aef67 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> @@ -168,6 +168,123 @@ void amdgpu_coredump(struct amdgpu_device *adev, bool 
> vram_lost,
>  {
>  }
>  #else
> +static void amdgpu_devcoredump_fw_info(struct amdgpu_device *adev, struct 
> drm_printer *p)
> +{
> +   uint32_t version;
> +   uint32_t feature;
> +   uint8_t smu_program, smu_major, smu_minor, smu_debug;
> +
> +   drm_printf(p, "VCE feature version: %u, fw version: 0x%08x\n",
> +  adev->vce.fb_version, adev->vce.fw_version);
> +   drm_printf(p, "UVD feature version: %u, fw version: 0x%08x\n",
> +  0, adev->uvd.fw_version);
> +   drm_printf(p, "GMC feature version: %u, fw version: 0x%08x\n",
> +  0, adev->gmc.fw_version);
> +   drm_printf(p, "ME feature version: %u, fw version: 0x%08x\n",
> +  adev->gfx.me_feature_version, adev->gfx.me_fw_version);
> +   drm_printf(p, "PFP feature version: %u, fw version: 0x%08x\n",
> +  adev->gfx.pfp_feature_version, adev->gfx.pfp_fw_version);
> +   drm_printf(p, "CE feature version: %u, fw version: 0x%08x\n",
> +  adev->gfx.ce_feature_version, adev->gfx.ce_fw_version);
> +   drm_printf(p, "RLC feature version: %u, fw version: 0x%08x\n",
> +  adev->gfx.rlc_feature_version, adev->gfx.rlc_fw_version);
> +
> +   drm_printf(p, "RLC SRLC feature version: %u, fw version: 0x%08x\n",
> +  adev->gfx.rlc_srlc_feature_version,
> +  adev->gfx.rlc_srlc_fw_version);
> +   drm_printf(p, "RLC SRLG feature version: %u, fw version: 0x%08x\n",
> +  adev->gfx.rlc_srlg_feature_version,
> +  adev->gfx.rlc_srlg_fw_version);
> +   drm_printf(p, "RLC SRLS feature version: %u, fw version: 0x%08x\n",
> +  adev->gfx.rlc_srls_feature_version,
> +  adev->gfx.rlc_srls_fw_version);
> +   drm_printf(p, "RLCP feature version: %u, fw version: 0x%08x\n",
> +  adev->gfx.rlcp_ucode_feature_version,
> +  adev->gfx.rlcp_ucode_version);
> +   drm_printf(p, "RLCV feature version: %u, fw version: 0x%08x\n",
> +  adev->gfx.rlcv_ucode_feature_version,
> +  adev->gfx.rlcv_ucode_version);
> +   drm_printf(p, "MEC feature version: %u, fw version: 0x%08x\n",
> +  adev->gfx.mec_feature_version,
> +  adev->gfx.mec_fw_version);
> +
> +   if (adev->gfx.mec2_fw)
> +   drm_printf(p,
> +  "MEC2 feature version: %u, fw version: 0x%08x\n",
> +  adev->gfx.mec2_feature_version,
> +  adev->gfx.mec2_fw_version);
> +
> +   drm_printf(p, "IMU feature version: %u, fw version: 0x%08x\n",
> +  0, adev->gfx.imu_fw_version);
> +   drm_printf(p, "PSP SOS feature version: %u, fw version: 0x%08x\n",
> +  adev->psp.sos.feature_version,
> +  adev->psp.sos.fw_version);
> +   drm_printf(p, "PSP ASD feature version: %u, fw version: 0x%08x\n",
> +  adev->psp.asd_context.bin_desc.feature_version,
> +  adev->psp.asd_context.bin_desc.fw_version);
> +
> +   drm_printf(p, "TA XGMI feature version: 0x%08x, fw version: 0x%08x\n",
> +  adev->psp.xgmi_context.context.bin_desc.feature_version,
> +  adev->psp.xgmi_context.context.bin_desc.fw_version);
> +   drm_printf(p, "TA RAS feature version: 0x%08x, fw version: 0x%08x\n",
> +  adev->psp.ras_context.context.bin_desc.feature_version,
> +  adev->psp.ras_context.context.bin_desc.fw_version);
> +   drm_printf(p, "TA HDCP feature version: 0x%08x, fw version: 0x%08x\n",
> +  adev->psp.hdcp_context.context.bin_desc.feature_version,
> +  adev->psp.hdcp_context.context.bin_desc.fw_version);
> +   drm_printf(p, "TA DTM feature version: 0x%08x, fw version: 0x%08x\n",
> +  adev->psp.dtm_context.context.bin_desc.feature_version,
> +  adev->psp.dtm_context.context.bin_desc.fw_version);
> +   drm_printf(p, "TA RAP feature version: 0x%08x, fw version: 0x%08x\n",
> +  

Re: [PATCH 1/2] drm/amdgpu: add the IP information of the soc

2024-03-13 Thread Alex Deucher
On Tue, Mar 12, 2024 at 8:41 AM Sunil Khatri  wrote:
>
> Add all the IP's information on a SOC to the
> devcoredump.
>
> Signed-off-by: Sunil Khatri 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 19 +++
>  1 file changed, 19 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> index a0dbccad2f53..611fdb90a1fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
> @@ -196,6 +196,25 @@ amdgpu_devcoredump_read(char *buffer, loff_t offset, 
> size_t count,
>coredump->reset_task_info.process_name,
>coredump->reset_task_info.pid);
>
> +   /* GPU IP's information of the SOC */
> +   if (coredump->adev) {
> +   drm_printf(, "\nIP Information\n");
> +   drm_printf(, "SOC Family: %d\n", coredump->adev->family);
> +   drm_printf(, "SOC Revision id: %d\n", 
> coredump->adev->rev_id);
> +
> +   for (int i = 0; i < coredump->adev->num_ip_blocks; i++) {
> +   struct amdgpu_ip_block *ip =
> +   >adev->ip_blocks[i];
> +   drm_printf(, "IP type: %d IP name: %s\n",
> +  ip->version->type,
> +  ip->version->funcs->name);
> +   drm_printf(, "IP version: (%d,%d,%d)\n\n",
> +  ip->version->major,
> +  ip->version->minor,
> +  ip->version->rev);
> +   }
> +   }

I think the IP discovery table would be more useful.  Either walk the
adev->ip_versions structure, or just include the IP discovery binary.

Alex

> +
> if (coredump->ring) {
> drm_printf(, "\nRing timed out details\n");
> drm_printf(, "IP Type: %d Ring Name: %s\n",
> --
> 2.34.1
>


Re: [PATCH AUTOSEL 5.15 3/5] drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series

2024-03-13 Thread Felix Kuehling

On 2024-03-11 11:14, Sasha Levin wrote:

From: Prike Liang 

[ Upstream commit c671ec01311b4744b377f98b0b4c6d033fe569b3 ]

Currently, GPU resets can now be performed successfully on the Raven
series. While GPU reset is required for the S3 suspend abort case.
So now can enable gpu reset for S3 abort cases on the Raven series.


This looks suspicious to me. I'm not sure what conditions made the GPU 
reset successful. But unless all the changes involved were also 
backported, this should probably not be applied to older kernel 
branches. I'm speculating it may be related to the removal of AMD IOMMUv2.


Regards,
  Felix




Signed-off-by: Prike Liang 
Acked-by: Alex Deucher 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
  drivers/gpu/drm/amd/amdgpu/soc15.c | 45 +-
  1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 6a3486f52d698..ef5b3eedc8615 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -605,11 +605,34 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
return AMD_RESET_METHOD_MODE1;
  }
  
+static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)

+{
+   u32 sol_reg;
+
+   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
+
+   /* Will reset for the following suspend abort cases.
+* 1) Only reset limit on APU side, dGPU hasn't checked yet.
+* 2) S3 suspend abort and TOS already launched.
+*/
+   if (adev->flags & AMD_IS_APU && adev->in_s3 &&
+   !adev->suspend_complete &&
+   sol_reg)
+   return true;
+
+   return false;
+}
+
  static int soc15_asic_reset(struct amdgpu_device *adev)
  {
/* original raven doesn't have full asic reset */
-   if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
-   (adev->apu_flags & AMD_APU_IS_RAVEN2))
+   /* On the latest Raven, the GPU reset can be performed
+* successfully. So now, temporarily enable it for the
+* S3 suspend abort case.
+*/
+   if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
+   (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
+   !soc15_need_reset_on_resume(adev))
return 0;
  
  	switch (soc15_asic_reset_method(adev)) {

@@ -1490,24 +1513,6 @@ static int soc15_common_suspend(void *handle)
return soc15_common_hw_fini(adev);
  }
  
-static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)

-{
-   u32 sol_reg;
-
-   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
-
-   /* Will reset for the following suspend abort cases.
-* 1) Only reset limit on APU side, dGPU hasn't checked yet.
-* 2) S3 suspend abort and TOS already launched.
-*/
-   if (adev->flags & AMD_IS_APU && adev->in_s3 &&
-   !adev->suspend_complete &&
-   sol_reg)
-   return true;
-
-   return false;
-}
-
  static int soc15_common_resume(void *handle)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;


Re: [PATCH] drm/amdgpu: Do a basic health check before reset

2024-03-13 Thread Felix Kuehling



On 2024-03-13 5:41, Lijo Lazar wrote:

Check if the device is present in the bus before trying to recover. It
could be that device itself is lost from the bus in some hang
situations.

Signed-off-by: Lijo Lazar 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++
  1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1e9454e6e4cb..b37113b79483 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5536,6 +5536,23 @@ static inline void 
amdgpu_device_stop_pending_resets(struct amdgpu_device *adev)
  
  }
  
+static int amdgpu_device_health_check(struct list_head *device_list_handle)

+{
+   struct amdgpu_device *tmp_adev;
+   int ret = 0;
+   u32 status;
+
+   list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
+   pci_read_config_dword(tmp_adev->pdev, PCI_COMMAND, );
+   if (PCI_POSSIBLE_ERROR(status)) {
+   dev_err(tmp_adev->dev, "device lost from bus!");
+   ret = -ENODEV;


You could just return here. What's the point of looking for other 
devices if you're going to return an error anyway?


Regards,
  Felix



+   }
+   }
+
+   return ret;
+}
+
  /**
   * amdgpu_device_gpu_recover - reset the asic and recover scheduler
   *
@@ -5607,6 +5624,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
device_list_handle = _list;
}
  
+	if (!amdgpu_sriov_vf(adev)) {

+   r = amdgpu_device_health_check(device_list_handle);
+   if (r)
+   goto end_reset;
+   }
+
/* We need to lock reset domain only once both for XGMI and single 
device */
tmp_adev = list_first_entry(device_list_handle, struct amdgpu_device,
reset_list);
@@ -5772,6 +5795,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
reset_list);
amdgpu_device_unlock_reset_domain(tmp_adev->reset_domain);
  
+end_reset:

if (hive) {
mutex_unlock(>hive_lock);
amdgpu_put_xgmi_hive(hive);


Re: [PATCH 2/2] drm:amdgpu: add firmware information of all IP's

2024-03-13 Thread Khatri, Sunil
[AMD Official Use Only - General]

Gentle reminder

Regards
Sunil

Get Outlook for Android

From: Sunil Khatri 
Sent: Tuesday, March 12, 2024 6:11:48 PM
To: Deucher, Alexander ; Koenig, Christian 
; Sharma, Shashank 
Cc: amd-gfx@lists.freedesktop.org ; 
dri-de...@lists.freedesktop.org ; 
linux-ker...@vger.kernel.org ; Khatri, Sunil 

Subject: [PATCH 2/2] drm:amdgpu: add firmware information of all IP's

Add firmware version information of each
IP and each instance where applicable.

Signed-off-by: Sunil Khatri 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 122 ++
 1 file changed, 122 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index 611fdb90a1fc..78ddc58aef67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -168,6 +168,123 @@ void amdgpu_coredump(struct amdgpu_device *adev, bool 
vram_lost,
 {
 }
 #else
+static void amdgpu_devcoredump_fw_info(struct amdgpu_device *adev, struct 
drm_printer *p)
+{
+   uint32_t version;
+   uint32_t feature;
+   uint8_t smu_program, smu_major, smu_minor, smu_debug;
+
+   drm_printf(p, "VCE feature version: %u, fw version: 0x%08x\n",
+  adev->vce.fb_version, adev->vce.fw_version);
+   drm_printf(p, "UVD feature version: %u, fw version: 0x%08x\n",
+  0, adev->uvd.fw_version);
+   drm_printf(p, "GMC feature version: %u, fw version: 0x%08x\n",
+  0, adev->gmc.fw_version);
+   drm_printf(p, "ME feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.me_feature_version, adev->gfx.me_fw_version);
+   drm_printf(p, "PFP feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.pfp_feature_version, adev->gfx.pfp_fw_version);
+   drm_printf(p, "CE feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.ce_feature_version, adev->gfx.ce_fw_version);
+   drm_printf(p, "RLC feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlc_feature_version, adev->gfx.rlc_fw_version);
+
+   drm_printf(p, "RLC SRLC feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlc_srlc_feature_version,
+  adev->gfx.rlc_srlc_fw_version);
+   drm_printf(p, "RLC SRLG feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlc_srlg_feature_version,
+  adev->gfx.rlc_srlg_fw_version);
+   drm_printf(p, "RLC SRLS feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlc_srls_feature_version,
+  adev->gfx.rlc_srls_fw_version);
+   drm_printf(p, "RLCP feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlcp_ucode_feature_version,
+  adev->gfx.rlcp_ucode_version);
+   drm_printf(p, "RLCV feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.rlcv_ucode_feature_version,
+  adev->gfx.rlcv_ucode_version);
+   drm_printf(p, "MEC feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.mec_feature_version,
+  adev->gfx.mec_fw_version);
+
+   if (adev->gfx.mec2_fw)
+   drm_printf(p,
+  "MEC2 feature version: %u, fw version: 0x%08x\n",
+  adev->gfx.mec2_feature_version,
+  adev->gfx.mec2_fw_version);
+
+   drm_printf(p, "IMU feature version: %u, fw version: 0x%08x\n",
+  0, adev->gfx.imu_fw_version);
+   drm_printf(p, "PSP SOS feature version: %u, fw version: 0x%08x\n",
+  adev->psp.sos.feature_version,
+  adev->psp.sos.fw_version);
+   drm_printf(p, "PSP ASD feature version: %u, fw version: 0x%08x\n",
+  adev->psp.asd_context.bin_desc.feature_version,
+  adev->psp.asd_context.bin_desc.fw_version);
+
+   drm_printf(p, "TA XGMI feature version: 0x%08x, fw version: 0x%08x\n",
+  adev->psp.xgmi_context.context.bin_desc.feature_version,
+  adev->psp.xgmi_context.context.bin_desc.fw_version);
+   drm_printf(p, "TA RAS feature version: 0x%08x, fw version: 0x%08x\n",
+  adev->psp.ras_context.context.bin_desc.feature_version,
+  adev->psp.ras_context.context.bin_desc.fw_version);
+   drm_printf(p, "TA HDCP feature version: 0x%08x, fw version: 0x%08x\n",
+  adev->psp.hdcp_context.context.bin_desc.feature_version,
+  adev->psp.hdcp_context.context.bin_desc.fw_version);
+   drm_printf(p, "TA DTM feature version: 0x%08x, fw version: 0x%08x\n",
+  adev->psp.dtm_context.context.bin_desc.feature_version,
+  adev->psp.dtm_context.context.bin_desc.fw_version);
+   drm_printf(p, "TA RAP feature version: 0x%08x, fw version: 0x%08x\n",
+ 

Re: [PATCH 1/2] drm/amdgpu: add the IP information of the soc

2024-03-13 Thread Khatri, Sunil
[AMD Official Use Only - General]

Gentle reminder for review.

Regards
Sunil

Get Outlook for Android

From: Sunil Khatri 
Sent: Tuesday, March 12, 2024 6:11:47 PM
To: Deucher, Alexander ; Koenig, Christian 
; Sharma, Shashank 
Cc: amd-gfx@lists.freedesktop.org ; 
dri-de...@lists.freedesktop.org ; 
linux-ker...@vger.kernel.org ; Khatri, Sunil 

Subject: [PATCH 1/2] drm/amdgpu: add the IP information of the soc

Add all the IP's information on a SOC to the
devcoredump.

Signed-off-by: Sunil Khatri 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index a0dbccad2f53..611fdb90a1fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -196,6 +196,25 @@ amdgpu_devcoredump_read(char *buffer, loff_t offset, 
size_t count,
coredump->reset_task_info.process_name,
coredump->reset_task_info.pid);

+   /* GPU IP's information of the SOC */
+   if (coredump->adev) {
+   drm_printf(, "\nIP Information\n");
+   drm_printf(, "SOC Family: %d\n", coredump->adev->family);
+   drm_printf(, "SOC Revision id: %d\n", coredump->adev->rev_id);
+
+   for (int i = 0; i < coredump->adev->num_ip_blocks; i++) {
+   struct amdgpu_ip_block *ip =
+   >adev->ip_blocks[i];
+   drm_printf(, "IP type: %d IP name: %s\n",
+  ip->version->type,
+  ip->version->funcs->name);
+   drm_printf(, "IP version: (%d,%d,%d)\n\n",
+  ip->version->major,
+  ip->version->minor,
+  ip->version->rev);
+   }
+   }
+
 if (coredump->ring) {
 drm_printf(, "\nRing timed out details\n");
 drm_printf(, "IP Type: %d Ring Name: %s\n",
--
2.34.1



Re: [PATCH 1/2] drm/amdgpu: add the IP information of the soc

2024-03-13 Thread Khatri, Sunil
[AMD Official Use Only - General]

Gentle Reminder for review.

Regards,
Sunil

Get Outlook for Android

From: Sunil Khatri 
Sent: Tuesday, March 12, 2024 6:11:47 PM
To: Deucher, Alexander ; Koenig, Christian 
; Sharma, Shashank 
Cc: amd-gfx@lists.freedesktop.org ; 
dri-de...@lists.freedesktop.org ; 
linux-ker...@vger.kernel.org ; Khatri, Sunil 

Subject: [PATCH 1/2] drm/amdgpu: add the IP information of the soc

Add all the IP's information on a SOC to the
devcoredump.

Signed-off-by: Sunil Khatri 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
index a0dbccad2f53..611fdb90a1fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
@@ -196,6 +196,25 @@ amdgpu_devcoredump_read(char *buffer, loff_t offset, 
size_t count,
coredump->reset_task_info.process_name,
coredump->reset_task_info.pid);

+   /* GPU IP's information of the SOC */
+   if (coredump->adev) {
+   drm_printf(, "\nIP Information\n");
+   drm_printf(, "SOC Family: %d\n", coredump->adev->family);
+   drm_printf(, "SOC Revision id: %d\n", coredump->adev->rev_id);
+
+   for (int i = 0; i < coredump->adev->num_ip_blocks; i++) {
+   struct amdgpu_ip_block *ip =
+   >adev->ip_blocks[i];
+   drm_printf(, "IP type: %d IP name: %s\n",
+  ip->version->type,
+  ip->version->funcs->name);
+   drm_printf(, "IP version: (%d,%d,%d)\n\n",
+  ip->version->major,
+  ip->version->minor,
+  ip->version->rev);
+   }
+   }
+
 if (coredump->ring) {
 drm_printf(, "\nRing timed out details\n");
 drm_printf(, "IP Type: %d Ring Name: %s\n",
--
2.34.1



Re: [PATCH] drm/amd/amdgpu: Enable IH Retry CAM by register read

2024-03-13 Thread Felix Kuehling

On 2024-03-13 13:43, Dewan Alam wrote:

IH Retry CAM should be enabled by register reads instead of always being set to 
true.
This explanation sounds odd. Your code is still writing the register 
first. What's the reason for reading back the register? I assume it's 
not needed for enabling the CAM, but to check whether it was enabled 
successfully. What are the configurations where it cannot be enabled 
successfully?


Two more nit-picks inline ...




Signed-off-by: Dewan Alam 
---
  drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
index b9e785846637..c330f5a88a06 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
@@ -337,13 +337,20 @@ static int vega20_ih_irq_init(struct amdgpu_device *adev)
  
  	/* Enable IH Retry CAM */

if (amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 0) ||
-   amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 2))
+   amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 2)) {
WREG32_FIELD15(OSSSYS, 0, IH_RETRY_INT_CAM_CNTL_ALDEBARAN,
   ENABLE, 1);
-   else
+   adev->irq.retry_cam_enabled = REG_GET_FIELD(
+   RREG32_SOC15(OSSSYS, 0,
+   mmIH_RETRY_INT_CAM_CNTL_ALDEBARAN),
+   IH_RETRY_INT_CAM_CNTL_ALDEBARAN, ENABLE);
+   } else {


Indentation looks wrong here.


WREG32_FIELD15(OSSSYS, 0, IH_RETRY_INT_CAM_CNTL, ENABLE, 1);
-
-   adev->irq.retry_cam_enabled = true;
+   adev->irq.retry_cam_enabled = REG_GET_FIELD(
+   RREG32_SOC15(OSSSYS, 0,
+   mmIH_RETRY_INT_CAM_CNTL),
+   IH_RETRY_INT_CAM_CNTL, ENABLE);
+   }


Wrong indentation.

Regards,
  Felix

  
  	/* enable interrupts */

ret = vega20_ih_toggle_interrupts(adev, true);


[PATCH] drm/amd/amdgpu: Enable IH Retry CAM by register read

2024-03-13 Thread Dewan Alam
IH Retry CAM should be enabled by register reads instead of always being set to 
true.

Signed-off-by: Dewan Alam 
---
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c 
b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
index b9e785846637..c330f5a88a06 100644
--- a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
+++ b/drivers/gpu/drm/amd/amdgpu/vega20_ih.c
@@ -337,13 +337,20 @@ static int vega20_ih_irq_init(struct amdgpu_device *adev)
 
/* Enable IH Retry CAM */
if (amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 0) ||
-   amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 2))
+   amdgpu_ip_version(adev, OSSSYS_HWIP, 0) == IP_VERSION(4, 4, 2)) {
WREG32_FIELD15(OSSSYS, 0, IH_RETRY_INT_CAM_CNTL_ALDEBARAN,
   ENABLE, 1);
-   else
+   adev->irq.retry_cam_enabled = REG_GET_FIELD(
+   RREG32_SOC15(OSSSYS, 0,
+   mmIH_RETRY_INT_CAM_CNTL_ALDEBARAN),
+   IH_RETRY_INT_CAM_CNTL_ALDEBARAN, ENABLE);
+   } else {
WREG32_FIELD15(OSSSYS, 0, IH_RETRY_INT_CAM_CNTL, ENABLE, 1);
-
-   adev->irq.retry_cam_enabled = true;
+   adev->irq.retry_cam_enabled = REG_GET_FIELD(
+   RREG32_SOC15(OSSSYS, 0,
+   mmIH_RETRY_INT_CAM_CNTL),
+   IH_RETRY_INT_CAM_CNTL, ENABLE);
+   }
 
/* enable interrupts */
ret = vega20_ih_toggle_interrupts(adev, true);
-- 
2.34.1



Re: [PATCH 33/43] drm/amd/display: Prevent crash on bring-up

2024-03-13 Thread Pillai, Aurabindo
[AMD Official Use Only - General]

Might want to avoid bringup in the commit description

--

Regards,
Jay

From: Wayne Lin 
Sent: Tuesday, March 12, 2024 5:20 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Wentland, Harry ; Li, Sun peng (Leo) 
; Siqueira, Rodrigo ; Pillai, 
Aurabindo ; Li, Roman ; Lin, Wayne 
; Gutierrez, Agustin ; Chung, 
ChiaHsuan (Tom) ; Wu, Hersen ; 
Zuo, Jerry ; Park, Chris ; Limonciello, 
Mario ; Deucher, Alexander 
; sta...@vger.kernel.org ; 
Liu, Charlene 
Subject: [PATCH 33/43] drm/amd/display: Prevent crash on bring-up

From: Chris Park 

[Why]
Disabling stream encoder invokes a function that no longer exists
in bring-up.

[How]
Check if the function declaration is NULL in disable stream encoder.

Cc: Mario Limonciello 
Cc: Alex Deucher 
Cc: sta...@vger.kernel.org
Reviewed-by: Charlene Liu 
Acked-by: Wayne Lin 
Signed-off-by: Chris Park 
---
 drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
index 9d5df4c0da59..0ba1feaf96c0 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
@@ -1185,7 +1185,8 @@ void dce110_disable_stream(struct pipe_ctx *pipe_ctx)
 if (dccg) {
 dccg->funcs->disable_symclk32_se(dccg, dp_hpo_inst);
 dccg->funcs->set_dpstreamclk(dccg, REFCLK, tg->inst, 
dp_hpo_inst);
-   dccg->funcs->set_dtbclk_dto(dccg, _params);
+   if (dccg && dccg->funcs->set_dtbclk_dto)
+   dccg->funcs->set_dtbclk_dto(dccg, _params);
 }
 } else if (dccg && dccg->funcs->disable_symclk_se) {
 dccg->funcs->disable_symclk_se(dccg, 
stream_enc->stream_enc_inst,
--
2.37.3



[PATCH] drm/amdkfd: range check cp bad op exception interrupts

2024-03-13 Thread Jonathan Kim
Due to a CP interrupt bug, bad packet garbage exception codes are raised.
Do a range check so that the debugger and runtime do not receive garbage
codes.
Update the user api to guard exception code type checking as well.

Signed-off-by: Jonathan Kim 
Tested-by: Jesse Zhang 
---
 .../gpu/drm/amd/amdkfd/kfd_int_process_v10.c|  3 ++-
 .../gpu/drm/amd/amdkfd/kfd_int_process_v11.c|  3 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c |  3 ++-
 include/uapi/linux/kfd_ioctl.h  | 17 ++---
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
index a8e76287dde0..013d0a073b9b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
@@ -339,7 +339,8 @@ static void event_interrupt_wq_v10(struct kfd_node *dev,
break;
}
kfd_signal_event_interrupt(pasid, context_id0 & 
0x7f, 23);
-   } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) {
+   } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE &&
+  
KFD_DBG_EC_TYPE_IS_PACKET(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0))) {
kfd_set_dbg_ev_from_interrupt(dev, pasid,
KFD_DEBUG_DOORBELL_ID(context_id0),

KFD_EC_MASK(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0)),
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
index 7e2859736a55..fe2ad0c0de95 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v11.c
@@ -328,7 +328,8 @@ static void event_interrupt_wq_v11(struct kfd_node *dev,
/* CP */
if (source_id == SOC15_INTSRC_CP_END_OF_PIPE)
kfd_signal_event_interrupt(pasid, context_id0, 32);
-   else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE)
+   else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE &&
+
KFD_DBG_EC_TYPE_IS_PACKET(KFD_CTXID0_CP_BAD_OP_ECODE(context_id0)))
kfd_set_dbg_ev_from_interrupt(dev, pasid,
KFD_CTXID0_DOORBELL_ID(context_id0),

KFD_EC_MASK(KFD_CTXID0_CP_BAD_OP_ECODE(context_id0)),
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
index ff7392336795..5483211c5d3d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
@@ -388,7 +388,8 @@ static void event_interrupt_wq_v9(struct kfd_node *dev,
break;
}
kfd_signal_event_interrupt(pasid, sq_int_data, 24);
-   } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE) {
+   } else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE &&
+  
KFD_DBG_EC_TYPE_IS_PACKET(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0))) {
kfd_set_dbg_ev_from_interrupt(dev, pasid,
KFD_DEBUG_DOORBELL_ID(context_id0),

KFD_EC_MASK(KFD_DEBUG_CP_BAD_OP_ECODE(context_id0)),
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 9ce46edc62a5..2040a470ddb4 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -913,14 +913,25 @@ enum kfd_dbg_trap_exception_code {
 KFD_EC_MASK(EC_DEVICE_NEW))
 #define KFD_EC_MASK_PROCESS(KFD_EC_MASK(EC_PROCESS_RUNTIME) |  \
 KFD_EC_MASK(EC_PROCESS_DEVICE_REMOVE))
+#define KFD_EC_MASK_PACKET 
(KFD_EC_MASK(EC_QUEUE_PACKET_DISPATCH_DIM_INVALID) |\
+
KFD_EC_MASK(EC_QUEUE_PACKET_DISPATCH_GROUP_SEGMENT_SIZE_INVALID) | \
+
KFD_EC_MASK(EC_QUEUE_PACKET_DISPATCH_CODE_INVALID) |   \
+KFD_EC_MASK(EC_QUEUE_PACKET_RESERVED) |
\
+KFD_EC_MASK(EC_QUEUE_PACKET_UNSUPPORTED) | 
\
+
KFD_EC_MASK(EC_QUEUE_PACKET_DISPATCH_WORK_GROUP_SIZE_INVALID) |\
+
KFD_EC_MASK(EC_QUEUE_PACKET_DISPATCH_REGISTER_INVALID) |   \
+
KFD_EC_MASK(EC_QUEUE_PACKET_VENDOR_UNSUPPORTED))
 
 /* Checks for exception code types for KFD search */
+#define KFD_DBG_EC_IS_VALID(ecode) (ecode > EC_NONE && ecode < EC_MAX)
 #define KFD_DBG_EC_TYPE_IS_QUEUE(ecode)
\
-   (!!(KFD_EC_MASK(ecode) & KFD_EC_MASK_QUEUE))
+   (KFD_DBG_EC_IS_VALID(ecode) && !!(KFD_EC_MASK(ecode) & 
KFD_EC_MASK_QUEUE))
 

Re: [PATCH] drm/amd/display: Get min/max vfreq from display_info

2024-03-13 Thread Hamza Mahfooz

On 3/12/24 09:47, Harry Wentland wrote:

We need the min/max vfreq on the amdgpu_dm_connector in order to
program VRR.

Fixes: db3e4f1cbb84 ("drm/amd/display: Use freesync when 
`DRM_EDID_FEATURE_CONTINUOUS_FREQ` found")
Signed-off-by: Harry Wentland 


Acked-by: Hamza Mahfooz 


---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index b1ca0aee0b30..cffb2655177c 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -11278,12 +11278,15 @@ void amdgpu_dm_update_freesync_caps(struct 
drm_connector *connector,
  
  		if (is_dp_capable_without_timing_msa(adev->dm.dc,

 amdgpu_dm_connector)) {
-   if (edid->features & DRM_EDID_FEATURE_CONTINUOUS_FREQ)
+   if (edid->features & DRM_EDID_FEATURE_CONTINUOUS_FREQ) {
freesync_capable = true;
-   else
+   amdgpu_dm_connector->min_vfreq = 
connector->display_info.monitor_range.min_vfreq;
+   amdgpu_dm_connector->max_vfreq = 
connector->display_info.monitor_range.max_vfreq;
+   } else {
edid_check_required = edid->version > 1 ||
  (edid->version == 1 &&
   edid->revision > 1);
+   }
}
  
  		if (edid_check_required) {

--
Hamza



RE: [PATCH] drm/amd/display: Get min/max vfreq from display_info

2024-03-13 Thread Wheeler, Daniel
[Public]

Hi all,

I can confirm that this re-enables VRR for a RX6800, and a RX7900XTX.

Tested-by: Daniel Wheeler 

Thank you,

Dan Wheeler
Sr. Technologist  |  AMD
SW Display
--
1 Commerce Valley Dr E, Thornhill, ON L3T 7X6
Facebook |  Twitter |  amd.com


-Original Message-
From: amd-gfx  On Behalf Of Harry 
Wentland
Sent: Tuesday, March 12, 2024 11:29 AM
To: Alex Deucher 
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amd/display: Get min/max vfreq from display_info



On 2024-03-12 10:58, Alex Deucher wrote:
> On Tue, Mar 12, 2024 at 9:57 AM Harry Wentland  wrote:
>>
>> We need the min/max vfreq on the amdgpu_dm_connector in order to
>> program VRR.
>>
>> Fixes: db3e4f1cbb84 ("drm/amd/display: Use freesync when
>> `DRM_EDID_FEATURE_CONTINUOUS_FREQ` found")
>> Signed-off-by: Harry Wentland 
>> ---
>>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> index b1ca0aee0b30..cffb2655177c 100644
>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> @@ -11278,12 +11278,15 @@ void amdgpu_dm_update_freesync_caps(struct
>> drm_connector *connector,
>>
>> if (is_dp_capable_without_timing_msa(adev->dm.dc,
>>  amdgpu_dm_connector)) {
>> -   if (edid->features & 
>> DRM_EDID_FEATURE_CONTINUOUS_FREQ)
>> +   if (edid->features &
>> + DRM_EDID_FEATURE_CONTINUOUS_FREQ) {
>> freesync_capable = true;
>> -   else
>> +   amdgpu_dm_connector->min_vfreq = 
>> connector->display_info.monitor_range.min_vfreq;
>> +   amdgpu_dm_connector->max_vfreq =
>> + connector->display_info.monitor_range.max_vfreq;
>
> Does this need special handling for DRM_EDID_RANGE_OFFSET_MIN_VFREQ
> and DRM_EDID_RANGE_OFFSET_MAX_VFREQ as well (similar to the code below
> it)?
>

get_monitor_range in drm_edid.c already handles it. I'm actually wondering if 
the "else" and "edid_check_required" case is still required now, as it 
essentially just duplicates the drm_edid code. But I don't want to rip it out 
in the same patch and without a bit of testing.

Harry

> Alex
>
>> +   } else {
>> edid_check_required = edid->version > 1 ||
>>   (edid->version == 1 &&
>>edid->revision
>> > 1);
>> +   }
>> }
>>
>> if (edid_check_required) {
>> --
>> 2.44.0
>>



Re: [PATCH 1/2] drm/amdgpu/pm: Fix NULL pointer dereference when get power limit

2024-03-13 Thread Alex Deucher
On Wed, Mar 13, 2024 at 7:07 AM Ma Jun  wrote:
>
> Because powerplay_table initialization is skipped under
> sriov case, We check and set default lower and upper OD
> value if powerplay_table is NULL.
>
> Fixes: 7968e9748fbb ("drm/amdgpu/pm: Fix the power1_min_cap value")
> Signed-off-by: Ma Jun 
> Reported-by: Yin Zhenguo 
> Suggested-by: Lazar Lijo 
> Suggested-by: Alex Deucher 

Series is:
Reviewed-by: Alex Deucher 

> ---
>  .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c| 14 --
>  drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c  | 16 +---
>  .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c  | 14 --
>  .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 14 --
>  .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c | 14 --
>  5 files changed, 41 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> index 1d96eb274d72..a406372e79d8 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
> @@ -1286,7 +1286,7 @@ static int arcturus_get_power_limit(struct smu_context 
> *smu,
> struct smu_11_0_powerplay_table *powerplay_table =
> (struct smu_11_0_powerplay_table 
> *)smu->smu_table.power_play_table;
> PPTable_t *pptable = smu->smu_table.driver_pptable;
> -   uint32_t power_limit, od_percent_upper, od_percent_lower;
> +   uint32_t power_limit, od_percent_upper = 0, od_percent_lower = 0;
>
> if (smu_v11_0_get_current_power_limit(smu, _limit)) {
> /* the last hope to figure out the ppt limit */
> @@ -1303,12 +1303,14 @@ static int arcturus_get_power_limit(struct 
> smu_context *smu,
> if (default_power_limit)
> *default_power_limit = power_limit;
>
> -   if (smu->od_enabled)
> -   od_percent_upper = 
> le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> -   else
> -   od_percent_upper = 0;
> +   if (powerplay_table) {
> +   if (smu->od_enabled)
> +   od_percent_upper = 
> le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> +   else
> +   od_percent_upper = 0;
>
> -   od_percent_lower = 
> le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> +   od_percent_lower = 
> le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> +   }
>
> dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
> (default power: %d)\n",
> od_percent_upper, 
> od_percent_lower, power_limit);
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> index ed189a3878eb..65bba5fc2335 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
> @@ -2339,7 +2339,7 @@ static int navi10_get_power_limit(struct smu_context 
> *smu,
> (struct smu_11_0_powerplay_table 
> *)smu->smu_table.power_play_table;
> struct smu_11_0_overdrive_table *od_settings = smu->od_settings;
> PPTable_t *pptable = smu->smu_table.driver_pptable;
> -   uint32_t power_limit, od_percent_upper, od_percent_lower;
> +   uint32_t power_limit, od_percent_upper = 0, od_percent_lower = 0;
>
> if (smu_v11_0_get_current_power_limit(smu, _limit)) {
> /* the last hope to figure out the ppt limit */
> @@ -2356,13 +2356,15 @@ static int navi10_get_power_limit(struct smu_context 
> *smu,
> if (default_power_limit)
> *default_power_limit = power_limit;
>
> -   if (smu->od_enabled &&
> -   navi10_od_feature_is_supported(od_settings, 
> SMU_11_0_ODCAP_POWER_LIMIT))
> -   od_percent_upper = 
> le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> -   else
> -   od_percent_upper = 0;
> +   if (powerplay_table) {
> +   if (smu->od_enabled &&
> +   navi10_od_feature_is_supported(od_settings, 
> SMU_11_0_ODCAP_POWER_LIMIT))
> +   od_percent_upper = 
> le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> +   else
> +   od_percent_upper = 0;
>
> -   od_percent_lower = 
> le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> +   od_percent_lower = 
> le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
> +   }
>
> dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
> (default power: %d)\n",
> 

Re: [PATCH] drm/amdgpu/vpe: power on vpe when hw_init

2024-03-13 Thread Alex Deucher
On Wed, Mar 13, 2024 at 7:41 AM Peyton Lee  wrote:
>
> To fix mode2 reset failure.
> Should power on VPE when hw_init.

When does it get powered down again?  Won't this leave it powered up?

Alex

>
> Signed-off-by: Peyton Lee 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> index 70c5cc80ecdc..ecfe0f36e83e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
> @@ -396,6 +396,12 @@ static int vpe_hw_init(void *handle)
> struct amdgpu_vpe *vpe = >vpe;
> int ret;
>
> +   /* Power on VPE */
> +   ret = amdgpu_device_ip_set_powergating_state(adev, 
> AMD_IP_BLOCK_TYPE_VPE,
> +AMD_PG_STATE_UNGATE);
> +   if (ret)
> +   return ret;
> +
> ret = vpe_load_microcode(vpe);
> if (ret)
> return ret;
> --
> 2.34.1
>


[PATCH] drm/amdgpu/vpe: power on vpe when hw_init

2024-03-13 Thread Peyton Lee
To fix mode2 reset failure.
Should power on VPE when hw_init.

Signed-off-by: Peyton Lee 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
index 70c5cc80ecdc..ecfe0f36e83e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c
@@ -396,6 +396,12 @@ static int vpe_hw_init(void *handle)
struct amdgpu_vpe *vpe = >vpe;
int ret;
 
+   /* Power on VPE */
+   ret = amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_VPE,
+AMD_PG_STATE_UNGATE);
+   if (ret)
+   return ret;
+
ret = vpe_load_microcode(vpe);
if (ret)
return ret;
-- 
2.34.1



[PATCH 9/9] drm/amd/pm: Remove unused interface to set plpd

2024-03-13 Thread Lijo Lazar
Remove unused callback to set PLPD policy and its implementation from
arcturus, aldebaran and SMUv13.0.6 SOCs.

Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  6 ---
 .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 22 ---
 .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 24 
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 39 ---
 4 files changed, 91 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index 5970b99c8f4e..65b2eb27582c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -874,12 +874,6 @@ struct pptable_funcs {
 */
int (*set_df_cstate)(struct smu_context *smu, enum pp_df_cstate state);
 
-   /**
-* @select_xgmi_plpd_policy: Select xgmi per-link power down policy.
-*/
-   int (*select_xgmi_plpd_policy)(struct smu_context *smu,
-  enum pp_xgmi_plpd_mode mode);
-
/**
 * @update_pcie_parameters: Update and upload the system's PCIe
 *  capabilites to the SMU.
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
index 943c52d8d0b2..7900e1a84999 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
@@ -2235,27 +2235,6 @@ static int arcturus_set_df_cstate(struct smu_context 
*smu,
return smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_DFCstateControl, 
state, NULL);
 }
 
-static int arcturus_select_xgmi_plpd_policy(struct smu_context *smu,
-   enum pp_xgmi_plpd_mode mode)
-{
-   /* PPSMC_MSG_GmiPwrDnControl is supported by 54.23.0 and onwards */
-   if (smu->smc_fw_version < 0x00361700) {
-   dev_err(smu->adev->dev, "XGMI power down control is only 
supported by PMFW 54.23.0 and onwards\n");
-   return -EINVAL;
-   }
-
-   if (mode == XGMI_PLPD_DEFAULT)
-   return smu_cmn_send_smc_msg_with_param(smu,
-  SMU_MSG_GmiPwrDnControl,
-  1, NULL);
-   else if (mode == XGMI_PLPD_DISALLOW)
-   return smu_cmn_send_smc_msg_with_param(smu,
-  SMU_MSG_GmiPwrDnControl,
-  0, NULL);
-   else
-   return -EINVAL;
-}
-
 static const struct throttling_logging_label {
uint32_t feature_mask;
const char *label;
@@ -2453,7 +2432,6 @@ static const struct pptable_funcs arcturus_ppt_funcs = {
.get_dpm_ultimate_freq = smu_v11_0_get_dpm_ultimate_freq,
.set_soft_freq_limited_range = smu_v11_0_set_soft_freq_limited_range,
.set_df_cstate = arcturus_set_df_cstate,
-   .select_xgmi_plpd_policy = arcturus_select_xgmi_plpd_policy,
.log_thermal_throttling_event = arcturus_log_thermal_throttling_event,
.get_pp_feature_mask = smu_cmn_get_pp_feature_mask,
.set_pp_feature_mask = smu_cmn_set_pp_feature_mask,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index a0efc3c39e64..94a201f8a790 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -1639,29 +1639,6 @@ static int aldebaran_set_df_cstate(struct smu_context 
*smu,
return smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_DFCstateControl, 
state, NULL);
 }
 
-static int aldebaran_select_xgmi_plpd_policy(struct smu_context *smu,
-enum pp_xgmi_plpd_mode mode)
-{
-   struct amdgpu_device *adev = smu->adev;
-
-   /* The message only works on master die and NACK will be sent
-  back for other dies, only send it on master die */
-   if (adev->smuio.funcs->get_socket_id(adev) ||
-   adev->smuio.funcs->get_die_id(adev))
-   return 0;
-
-   if (mode == XGMI_PLPD_DEFAULT)
-   return smu_cmn_send_smc_msg_with_param(smu,
-  SMU_MSG_GmiPwrDnControl,
-  0, NULL);
-   else if (mode == XGMI_PLPD_DISALLOW)
-   return smu_cmn_send_smc_msg_with_param(smu,
-  SMU_MSG_GmiPwrDnControl,
-  1, NULL);
-   else
-   return -EINVAL;
-}
-
 static const struct throttling_logging_label {
uint32_t feature_mask;
const char *label;
@@ -2100,7 +2077,6 @@ static const struct pptable_funcs aldebaran_ppt_funcs = {
.set_soft_freq_limited_range = 

[PATCH 8/9] drm/amd/pm: Remove legacy interface for xgmi plpd

2024-03-13 Thread Lijo Lazar
Replace the legacy interface with amdgpu_dpm_set_pm_policy to set XGMI
PLPD mode. Also, xgmi_plpd sysfs node is not used by any client. Remove
that as well.

Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c  |  4 +-
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 43 
 drivers/gpu/drm/amd/pm/amdgpu_pm.c| 68 ---
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  5 --
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 27 
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  2 -
 6 files changed, 2 insertions(+), 147 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 20d51f6c9bb8..ecac9607a819 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -1429,7 +1429,7 @@ static int amdgpu_ras_error_inject_xgmi(struct 
amdgpu_device *adev,
if (amdgpu_dpm_set_df_cstate(adev, DF_CSTATE_DISALLOW))
dev_warn(adev->dev, "Failed to disallow df cstate");
 
-   ret1 = amdgpu_dpm_set_xgmi_plpd_mode(adev, XGMI_PLPD_DISALLOW);
+   ret1 = amdgpu_dpm_set_pm_policy(adev, PP_PM_POLICY_XGMI_PLPD, 
XGMI_PLPD_DISALLOW);
if (ret1 && ret1 != -EOPNOTSUPP)
dev_warn(adev->dev, "Failed to disallow XGMI power down");
 
@@ -1438,7 +1438,7 @@ static int amdgpu_ras_error_inject_xgmi(struct 
amdgpu_device *adev,
if (amdgpu_ras_intr_triggered())
return ret2;
 
-   ret1 = amdgpu_dpm_set_xgmi_plpd_mode(adev, XGMI_PLPD_DEFAULT);
+   ret1 = amdgpu_dpm_set_pm_policy(adev, PP_PM_POLICY_XGMI_PLPD, 
XGMI_PLPD_DEFAULT);
if (ret1 && ret1 != -EOPNOTSUPP)
dev_warn(adev->dev, "Failed to allow XGMI power down");
 
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index db3addd07120..21d05fab20d5 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -368,49 +368,6 @@ int amdgpu_dpm_set_df_cstate(struct amdgpu_device *adev,
return ret;
 }
 
-int amdgpu_dpm_get_xgmi_plpd_mode(struct amdgpu_device *adev, char **mode_desc)
-{
-   struct smu_context *smu = adev->powerplay.pp_handle;
-   int mode = XGMI_PLPD_NONE;
-
-   if (is_support_sw_smu(adev)) {
-   mode = smu->plpd_mode;
-   if (mode_desc == NULL)
-   return mode;
-   switch (smu->plpd_mode) {
-   case XGMI_PLPD_DISALLOW:
-   *mode_desc = "disallow";
-   break;
-   case XGMI_PLPD_DEFAULT:
-   *mode_desc = "default";
-   break;
-   case XGMI_PLPD_OPTIMIZED:
-   *mode_desc = "optimized";
-   break;
-   case XGMI_PLPD_NONE:
-   default:
-   *mode_desc = "none";
-   break;
-   }
-   }
-
-   return mode;
-}
-
-int amdgpu_dpm_set_xgmi_plpd_mode(struct amdgpu_device *adev, int mode)
-{
-   struct smu_context *smu = adev->powerplay.pp_handle;
-   int ret = -EOPNOTSUPP;
-
-   if (is_support_sw_smu(adev)) {
-   mutex_lock(>pm.mutex);
-   ret = smu_set_xgmi_plpd_mode(smu, mode);
-   mutex_unlock(>pm.mutex);
-   }
-
-   return ret;
-}
-
 ssize_t amdgpu_dpm_get_pm_policy_info(struct amdgpu_device *adev, char *buf)
 {
struct smu_context *smu = adev->powerplay.pp_handle;
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 1d5a8428601d..f7a289440c4c 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -2091,70 +2091,6 @@ static int pp_dpm_dcefclk_attr_update(struct 
amdgpu_device *adev, struct amdgpu_
return 0;
 }
 
-/* Following items will be read out to indicate current plpd policy:
- *  - -1: none
- *  - 0: disallow
- *  - 1: default
- *  - 2: optimized
- */
-static ssize_t amdgpu_get_xgmi_plpd_policy(struct device *dev,
-  struct device_attribute *attr,
-  char *buf)
-{
-   struct drm_device *ddev = dev_get_drvdata(dev);
-   struct amdgpu_device *adev = drm_to_adev(ddev);
-   char *mode_desc = "none";
-   int mode;
-
-   if (amdgpu_in_reset(adev))
-   return -EPERM;
-   if (adev->in_suspend && !adev->in_runpm)
-   return -EPERM;
-
-   mode = amdgpu_dpm_get_xgmi_plpd_mode(adev, _desc);
-
-   return sysfs_emit(buf, "%d: %s\n", mode, mode_desc);
-}
-
-/* Following argument value is expected from user to change plpd policy
- *  - arg 0: disallow plpd
- *  - arg 1: default policy
- *  - arg 2: optimized policy
- */
-static ssize_t amdgpu_set_xgmi_plpd_policy(struct device *dev,
-  struct 

[PATCH 6/9] drm/amd/pm: Add xgmi plpd to aldebaran pm_policy

2024-03-13 Thread Lijo Lazar
On aldebaran, allow changing xgmi plpd policy through pm_policy sysfs
interface.

Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
---
 .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c| 35 +++
 1 file changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
index 0467864a1aa8..a0efc3c39e64 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/aldebaran_ppt.c
@@ -266,9 +266,30 @@ static int aldebaran_tables_init(struct smu_context *smu)
return 0;
 }
 
+static int aldebaran_select_plpd_policy(struct smu_context *smu, int level)
+{
+   struct amdgpu_device *adev = smu->adev;
+
+   /* The message only works on master die and NACK will be sent
+  back for other dies, only send it on master die */
+   if (adev->smuio.funcs->get_socket_id(adev) ||
+   adev->smuio.funcs->get_die_id(adev))
+   return 0;
+
+   if (level == XGMI_PLPD_DEFAULT)
+   return smu_cmn_send_smc_msg_with_param(
+   smu, SMU_MSG_GmiPwrDnControl, 0, NULL);
+   else if (level == XGMI_PLPD_DISALLOW)
+   return smu_cmn_send_smc_msg_with_param(
+   smu, SMU_MSG_GmiPwrDnControl, 1, NULL);
+   else
+   return -EINVAL;
+}
+
 static int aldebaran_allocate_dpm_context(struct smu_context *smu)
 {
struct smu_dpm_context *smu_dpm = >smu_dpm;
+   struct smu_dpm_policy *policy;
 
smu_dpm->dpm_context = kzalloc(sizeof(struct smu_13_0_dpm_context),
   GFP_KERNEL);
@@ -276,6 +297,20 @@ static int aldebaran_allocate_dpm_context(struct 
smu_context *smu)
return -ENOMEM;
smu_dpm->dpm_context_size = sizeof(struct smu_13_0_dpm_context);
 
+   smu_dpm->dpm_policies =
+   kzalloc(sizeof(struct smu_dpm_policy_ctxt), GFP_KERNEL);
+
+   if (!smu_dpm->dpm_policies)
+   return -ENOMEM;
+
+   policy = &(smu_dpm->dpm_policies->policies[0]);
+   policy->policy_type = PP_PM_POLICY_XGMI_PLPD;
+   policy->level_mask = BIT(XGMI_PLPD_DISALLOW) | BIT(XGMI_PLPD_DEFAULT);
+   policy->current_level = XGMI_PLPD_DEFAULT;
+   policy->set_policy = aldebaran_select_plpd_policy;
+   smu_cmn_generic_plpd_policy_desc(policy);
+   smu_dpm->dpm_policies->policy_mask |= BIT(PP_PM_POLICY_XGMI_PLPD);
+
return 0;
 }
 
-- 
2.25.1



[PATCH 7/9] drm/amd/pm: Add xgmi plpd to arcturus pm_policy

2024-03-13 Thread Lijo Lazar
On arcturus, allow changing xgmi plpd policy through pm_policy sysfs
interface.

Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c |  7 ++--
 .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 42 +++
 2 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 42d43da7de4c..1c24f2cc5b29 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -1192,19 +1192,20 @@ static void smu_init_xgmi_plpd_mode(struct smu_context 
*smu)
 {
struct smu_dpm_context *dpm_ctxt = &(smu->smu_dpm);
struct smu_dpm_policy_ctxt *policy_ctxt;
+   struct smu_dpm_policy *policy;
 
+   policy = smu_get_pm_policy(smu, PP_PM_POLICY_XGMI_PLPD);
if (amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(11, 0, 2)) {
smu->plpd_mode = XGMI_PLPD_DEFAULT;
+   if (policy)
+   policy->current_level = XGMI_PLPD_DEFAULT;
return;
}
 
/* PMFW put PLPD into default policy after enabling the feature */
if (smu_feature_is_enabled(smu,
   SMU_FEATURE_XGMI_PER_LINK_PWR_DWN_BIT)) {
-   struct smu_dpm_policy *policy;
-
smu->plpd_mode = XGMI_PLPD_DEFAULT;
-   policy = smu_get_pm_policy(smu, PP_PM_POLICY_XGMI_PLPD);
if (policy)
policy->current_level = XGMI_PLPD_DEFAULT;
} else {
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
index 1d96eb274d72..943c52d8d0b2 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
@@ -283,9 +283,29 @@ static int arcturus_tables_init(struct smu_context *smu)
return 0;
 }
 
+static int arcturus_select_plpd_policy(struct smu_context *smu, int level)
+{
+   /* PPSMC_MSG_GmiPwrDnControl is supported by 54.23.0 and onwards */
+   if (smu->smc_fw_version < 0x00361700) {
+   dev_err(smu->adev->dev,
+   "XGMI power down control is only supported by PMFW 
54.23.0 and onwards\n");
+   return -EINVAL;
+   }
+
+   if (level == XGMI_PLPD_DEFAULT)
+   return smu_cmn_send_smc_msg_with_param(
+   smu, SMU_MSG_GmiPwrDnControl, 1, NULL);
+   else if (level == XGMI_PLPD_DISALLOW)
+   return smu_cmn_send_smc_msg_with_param(
+   smu, SMU_MSG_GmiPwrDnControl, 0, NULL);
+   else
+   return -EINVAL;
+}
+
 static int arcturus_allocate_dpm_context(struct smu_context *smu)
 {
struct smu_dpm_context *smu_dpm = >smu_dpm;
+   struct smu_dpm_policy *policy;
 
smu_dpm->dpm_context = kzalloc(sizeof(struct smu_11_0_dpm_context),
   GFP_KERNEL);
@@ -293,6 +313,20 @@ static int arcturus_allocate_dpm_context(struct 
smu_context *smu)
return -ENOMEM;
smu_dpm->dpm_context_size = sizeof(struct smu_11_0_dpm_context);
 
+   smu_dpm->dpm_policies =
+   kzalloc(sizeof(struct smu_dpm_policy_ctxt), GFP_KERNEL);
+
+   if (!smu_dpm->dpm_policies)
+   return -ENOMEM;
+
+   policy = &(smu_dpm->dpm_policies->policies[0]);
+   policy->policy_type = PP_PM_POLICY_XGMI_PLPD;
+   policy->level_mask = BIT(XGMI_PLPD_DISALLOW) | BIT(XGMI_PLPD_DEFAULT);
+   policy->current_level = XGMI_PLPD_DEFAULT;
+   policy->set_policy = arcturus_select_plpd_policy;
+   smu_cmn_generic_plpd_policy_desc(policy);
+   smu_dpm->dpm_policies->policy_mask |= BIT(PP_PM_POLICY_XGMI_PLPD);
+
return 0;
 }
 
@@ -403,6 +437,14 @@ static int arcturus_set_default_dpm_table(struct 
smu_context *smu)
dpm_table->max = dpm_table->dpm_levels[0].value;
}
 
+   /* XGMI PLPD is supported by 54.23.0 and onwards */
+   if (smu->smc_fw_version < 0x00361700) {
+   struct smu_dpm_context *smu_dpm = >smu_dpm;
+
+   smu_dpm->dpm_policies->policy_mask &=
+   ~BIT(PP_PM_POLICY_XGMI_PLPD);
+   }
+
return 0;
 }
 
-- 
2.25.1



[PATCH 4/9] drm/amd/pm: Add xgmi plpd policy to pm_policy

2024-03-13 Thread Lijo Lazar
Add support to set XGMI PLPD policy levels through pm_policy sysfs node.

Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/include/kgd_pp_interface.h | 1 +
 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index e48da7acd7a7..45c78cd08eae 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -275,6 +275,7 @@ enum pp_xgmi_plpd_mode {
 enum pp_pm_policy {
PP_PM_POLICY_NONE = -1,
PP_PM_POLICY_SOC_PSTATE = 0,
+   PP_PM_POLICY_XGMI_PLPD,
PP_PM_POLICY_NUM,
 };
 
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index d8c8eaff3355..1d5a8428601d 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -2195,6 +2195,9 @@ static ssize_t amdgpu_set_pm_policy(struct device *dev,
if (strncmp(tmp, "soc_pstate", strlen("soc_pstate")) == 0) {
policy_type = PP_PM_POLICY_SOC_PSTATE;
tmp += strlen("soc_pstate");
+   } else if (strncmp(tmp, "xgmi", strlen("xgmi")) == 0) {
+   policy_type = PP_PM_POLICY_XGMI_PLPD;
+   tmp += strlen("xgmi");
} else {
return -EINVAL;
}
-- 
2.25.1



[PATCH 5/9] drm/amd/pm: Add xgmi plpd to SMU v13.0.6 pm_policy

2024-03-13 Thread Lijo Lazar
On SOCs with SMU v13.0.6, allow changing xgmi plpd policy through
pm_policy sysfs interface.

Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 15 +-
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 51 +--
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 27 ++
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h|  1 +
 4 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 1c23e0985377..42d43da7de4c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -1190,6 +1190,9 @@ static void smu_swctf_delayed_work_handler(struct 
work_struct *work)
 
 static void smu_init_xgmi_plpd_mode(struct smu_context *smu)
 {
+   struct smu_dpm_context *dpm_ctxt = &(smu->smu_dpm);
+   struct smu_dpm_policy_ctxt *policy_ctxt;
+
if (amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(11, 0, 2)) {
smu->plpd_mode = XGMI_PLPD_DEFAULT;
return;
@@ -1197,10 +1200,18 @@ static void smu_init_xgmi_plpd_mode(struct smu_context 
*smu)
 
/* PMFW put PLPD into default policy after enabling the feature */
if (smu_feature_is_enabled(smu,
-  SMU_FEATURE_XGMI_PER_LINK_PWR_DWN_BIT))
+  SMU_FEATURE_XGMI_PER_LINK_PWR_DWN_BIT)) {
+   struct smu_dpm_policy *policy;
+
smu->plpd_mode = XGMI_PLPD_DEFAULT;
-   else
+   policy = smu_get_pm_policy(smu, PP_PM_POLICY_XGMI_PLPD);
+   if (policy)
+   policy->current_level = XGMI_PLPD_DEFAULT;
+   } else {
smu->plpd_mode = XGMI_PLPD_NONE;
+   policy_ctxt = dpm_ctxt->dpm_policies;
+   policy_ctxt->policy_mask &= ~BIT(PP_PM_POLICY_XGMI_PLPD);
+   }
 }
 
 static int smu_sw_init(void *handle)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index f227f05100f8..2d6428ca1217 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -402,9 +402,45 @@ static int smu_v13_0_6_select_policy_soc_pstate(struct 
smu_context *smu,
return ret;
 }
 
+static int smu_v13_0_6_select_plpd_policy(struct smu_context *smu, int level)
+{
+   struct amdgpu_device *adev = smu->adev;
+   int ret, param;
+
+   switch (level) {
+   case XGMI_PLPD_DEFAULT:
+   param = PPSMC_PLPD_MODE_DEFAULT;
+   break;
+   case XGMI_PLPD_OPTIMIZED:
+   param = PPSMC_PLPD_MODE_OPTIMIZED;
+   break;
+   case XGMI_PLPD_DISALLOW:
+   param = 0;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   if (level == XGMI_PLPD_DISALLOW)
+   ret = smu_cmn_send_smc_msg_with_param(
+   smu, SMU_MSG_GmiPwrDnControl, param, NULL);
+   else
+   /* change xgmi per-link power down policy */
+   ret = smu_cmn_send_smc_msg_with_param(
+   smu, SMU_MSG_SelectPLPDMode, param, NULL);
+
+   if (ret)
+   dev_err(adev->dev,
+   "select xgmi per-link power down policy %d failed\n",
+   level);
+
+   return ret;
+}
+
 static int smu_v13_0_6_allocate_dpm_context(struct smu_context *smu)
 {
struct smu_dpm_context *smu_dpm = >smu_dpm;
+   struct smu_dpm_policy *policy;
 
smu_dpm->dpm_context =
kzalloc(sizeof(struct smu_13_0_dpm_context), GFP_KERNEL);
@@ -412,11 +448,9 @@ static int smu_v13_0_6_allocate_dpm_context(struct 
smu_context *smu)
return -ENOMEM;
smu_dpm->dpm_context_size = sizeof(struct smu_13_0_dpm_context);
 
+   smu_dpm->dpm_policies =
+   kzalloc(sizeof(struct smu_dpm_policy_ctxt), GFP_KERNEL);
if (!(smu->adev->flags & AMD_IS_APU)) {
-   struct smu_dpm_policy *policy;
-
-   smu_dpm->dpm_policies =
-   kzalloc(sizeof(struct smu_dpm_policy_ctxt), GFP_KERNEL);
policy = &(smu_dpm->dpm_policies->policies[0]);
 
policy->policy_type = PP_PM_POLICY_SOC_PSTATE;
@@ -429,6 +463,15 @@ static int smu_v13_0_6_allocate_dpm_context(struct 
smu_context *smu)
smu_dpm->dpm_policies->policy_mask |=
BIT(PP_PM_POLICY_SOC_PSTATE);
}
+   policy = &(smu_dpm->dpm_policies->policies[1]);
+
+   policy->policy_type = PP_PM_POLICY_XGMI_PLPD;
+   policy->level_mask = BIT(XGMI_PLPD_DISALLOW) | BIT(XGMI_PLPD_DEFAULT) |
+BIT(XGMI_PLPD_OPTIMIZED);
+   policy->current_level = XGMI_PLPD_DEFAULT;
+   policy->set_policy = smu_v13_0_6_select_plpd_policy;
+   

[PATCH 2/9] drm/amd/pm: Update PMFW messages for SMUv13.0.6

2024-03-13 Thread Lijo Lazar
Add PMF message to select a Pstate policy in SOCs with SMU v13.0.6.

Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h | 3 ++-
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h
index 86758051cb93..41cb681927e2 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h
@@ -92,7 +92,8 @@
 #define PPSMC_MSG_McaBankCeDumpDW   0x3B
 #define PPSMC_MSG_SelectPLPDMode0x40
 #define PPSMC_MSG_RmaDueToBadPageThreshold  0x43
-#define PPSMC_Message_Count 0x44
+#define PPSMC_MSG_SelectPstatePolicy0x44
+#define PPSMC_Message_Count 0x45
 
 //PPSMC Reset Types for driver msg argument
 #define PPSMC_RESET_TYPE_DRIVER_MODE_1_RESET0x1
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
index a941fdbf78b6..dadbacde32ab 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
@@ -262,7 +262,8 @@
__SMU_DUMMY_MAP(SetSoftMinVpe), \
__SMU_DUMMY_MAP(GetMetricsVersion), \
__SMU_DUMMY_MAP(EnableUCLKShadow), \
-   __SMU_DUMMY_MAP(RmaDueToBadPageThreshold),
+   __SMU_DUMMY_MAP(RmaDueToBadPageThreshold),\
+   __SMU_DUMMY_MAP(SelectPstatePolicy),
 
 #undef __SMU_DUMMY_MAP
 #define __SMU_DUMMY_MAP(type)  SMU_MSG_##type
-- 
2.25.1



[PATCH 3/9] drm/amd/pm: Add support to select pstate policy

2024-03-13 Thread Lijo Lazar
Add support to select pstate policy in SOCs with SMUv13.0.6

Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
---
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c|  2 +
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 71 +++
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 30 
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h|  1 +
 4 files changed, 104 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 48170bb5112e..8a86c74f28e7 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -530,10 +530,12 @@ int smu_v13_0_fini_smc_tables(struct smu_context *smu)
smu_table->watermarks_table = NULL;
smu_table->metrics_time = 0;
 
+   kfree(smu_dpm->dpm_policies);
kfree(smu_dpm->dpm_context);
kfree(smu_dpm->golden_dpm_context);
kfree(smu_dpm->dpm_current_power_state);
kfree(smu_dpm->dpm_request_power_state);
+   smu_dpm->dpm_policies = NULL;
smu_dpm->dpm_context = NULL;
smu_dpm->golden_dpm_context = NULL;
smu_dpm->dpm_context_size = 0;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
index 744c84f3029f..f227f05100f8 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
@@ -173,6 +173,7 @@ static const struct cmn2asic_msg_mapping 
smu_v13_0_6_message_map[SMU_MSG_MAX_COU
MSG_MAP(McaBankCeDumpDW, PPSMC_MSG_McaBankCeDumpDW, 
0),
MSG_MAP(SelectPLPDMode,  PPSMC_MSG_SelectPLPDMode,  
0),
MSG_MAP(RmaDueToBadPageThreshold,
PPSMC_MSG_RmaDueToBadPageThreshold,0),
+   MSG_MAP(SelectPstatePolicy,  
PPSMC_MSG_SelectPstatePolicy,  0),
 };
 
 // clang-format on
@@ -368,6 +369,39 @@ static int smu_v13_0_6_tables_init(struct smu_context *smu)
return 0;
 }
 
+static int smu_v13_0_6_select_policy_soc_pstate(struct smu_context *smu,
+   int policy)
+{
+   struct amdgpu_device *adev = smu->adev;
+   int ret, param;
+
+   switch (policy) {
+   case SOC_PSTATE_DEFAULT:
+   param = 0;
+   break;
+   case SOC_PSTATE_0:
+   param = 1;
+   break;
+   case SOC_PSTATE_1:
+   param = 2;
+   break;
+   case SOC_PSTATE_2:
+   param = 3;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SelectPstatePolicy,
+ param, NULL);
+
+   if (ret)
+   dev_err(adev->dev, "select soc pstate policy %d failed",
+   policy);
+
+   return ret;
+}
+
 static int smu_v13_0_6_allocate_dpm_context(struct smu_context *smu)
 {
struct smu_dpm_context *smu_dpm = >smu_dpm;
@@ -378,6 +412,24 @@ static int smu_v13_0_6_allocate_dpm_context(struct 
smu_context *smu)
return -ENOMEM;
smu_dpm->dpm_context_size = sizeof(struct smu_13_0_dpm_context);
 
+   if (!(smu->adev->flags & AMD_IS_APU)) {
+   struct smu_dpm_policy *policy;
+
+   smu_dpm->dpm_policies =
+   kzalloc(sizeof(struct smu_dpm_policy_ctxt), GFP_KERNEL);
+   policy = &(smu_dpm->dpm_policies->policies[0]);
+
+   policy->policy_type = PP_PM_POLICY_SOC_PSTATE;
+   policy->level_mask = BIT(SOC_PSTATE_DEFAULT) |
+BIT(SOC_PSTATE_0) | BIT(SOC_PSTATE_1) |
+BIT(SOC_PSTATE_2);
+   policy->current_level = SOC_PSTATE_DEFAULT;
+   policy->set_policy = smu_v13_0_6_select_policy_soc_pstate;
+   smu_cmn_generic_soc_policy_desc(policy);
+   smu_dpm->dpm_policies->policy_mask |=
+   BIT(PP_PM_POLICY_SOC_PSTATE);
+   }
+
return 0;
 }
 
@@ -636,6 +688,15 @@ static int smu_v13_0_6_get_dpm_level_count(struct 
smu_context *smu,
return ret;
 }
 
+static void smu_v13_0_6_pm_policy_init(struct smu_context *smu)
+{
+   struct smu_dpm_policy *policy;
+
+   policy = smu_get_pm_policy(smu, PP_PM_POLICY_SOC_PSTATE);
+   if (policy)
+   policy->current_level = SOC_PSTATE_DEFAULT;
+}
+
 static int smu_v13_0_6_set_default_dpm_table(struct smu_context *smu)
 {
struct smu_13_0_dpm_context *dpm_context = smu->smu_dpm.dpm_context;
@@ -665,6 +726,16 @@ static int smu_v13_0_6_set_default_dpm_table(struct 
smu_context *smu)
 
smu_v13_0_6_setup_driver_pptable(smu);
 
+   /* DPM policy not supported in older firmwares */
+   if (!(smu->adev->flags & AMD_IS_APU) &&
+   

[PATCH 1/9] drm/amd/pm: Add support for DPM policies

2024-03-13 Thread Lijo Lazar
Add support to set/get information about different DPM policies. The
support is only available on SOCs which use swsmu architecture.

A DPM policy type may be defined with different levels. For example, a
policy may be defined to select Pstate preference and then later a
pstate preference may be chosen.

Signed-off-by: Lijo Lazar 
Reviewed-by: Hawking Zhang 
---
 .../gpu/drm/amd/include/kgd_pp_interface.h| 16 
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 29 ++
 drivers/gpu/drm/amd/pm/amdgpu_pm.c| 92 ++
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  4 +
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 95 +++
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 29 ++
 6 files changed, 265 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h 
b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
index 32054ecf0b87..e48da7acd7a7 100644
--- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
@@ -272,6 +272,22 @@ enum pp_xgmi_plpd_mode {
XGMI_PLPD_COUNT,
 };
 
+enum pp_pm_policy {
+   PP_PM_POLICY_NONE = -1,
+   PP_PM_POLICY_SOC_PSTATE = 0,
+   PP_PM_POLICY_NUM,
+};
+
+enum pp_policy_soc_pstate {
+   SOC_PSTATE_DEFAULT = 0,
+   SOC_PSTATE_0,
+   SOC_PSTATE_1,
+   SOC_PSTATE_2,
+   SOC_PSTAT_COUNT,
+};
+
+#define PP_POLICY_MAX_LEVELS 5
+
 #define PP_GROUP_MASK0xF000
 #define PP_GROUP_SHIFT   28
 
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index f84bfed50681..db3addd07120 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -411,6 +411,35 @@ int amdgpu_dpm_set_xgmi_plpd_mode(struct amdgpu_device 
*adev, int mode)
return ret;
 }
 
+ssize_t amdgpu_dpm_get_pm_policy_info(struct amdgpu_device *adev, char *buf)
+{
+   struct smu_context *smu = adev->powerplay.pp_handle;
+   int ret = -EOPNOTSUPP;
+
+   if (is_support_sw_smu(adev)) {
+   mutex_lock(>pm.mutex);
+   ret = smu_get_pm_policy_info(smu, buf);
+   mutex_unlock(>pm.mutex);
+   }
+
+   return ret;
+}
+
+int amdgpu_dpm_set_pm_policy(struct amdgpu_device *adev, int policy_type,
+int policy_level)
+{
+   struct smu_context *smu = adev->powerplay.pp_handle;
+   int ret = -EOPNOTSUPP;
+
+   if (is_support_sw_smu(adev)) {
+   mutex_lock(>pm.mutex);
+   ret = smu_set_pm_policy(smu, policy_type, policy_level);
+   mutex_unlock(>pm.mutex);
+   }
+
+   return ret;
+}
+
 int amdgpu_dpm_enable_mgpu_fan_boost(struct amdgpu_device *adev)
 {
void *pp_handle = adev->powerplay.pp_handle;
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index f09b9d49297e..d8c8eaff3355 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -2155,6 +2155,96 @@ static ssize_t amdgpu_set_xgmi_plpd_policy(struct device 
*dev,
return count;
 }
 
+static ssize_t amdgpu_get_pm_policy(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct drm_device *ddev = dev_get_drvdata(dev);
+   struct amdgpu_device *adev = drm_to_adev(ddev);
+
+   if (amdgpu_in_reset(adev))
+   return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
+
+   return amdgpu_dpm_get_pm_policy_info(adev, buf);
+}
+
+static ssize_t amdgpu_set_pm_policy(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct drm_device *ddev = dev_get_drvdata(dev);
+   struct amdgpu_device *adev = drm_to_adev(ddev);
+   int policy_type, ret, num_params = 0;
+   char delimiter[] = " \n\t";
+   char tmp_buf[128];
+   char *tmp, *param;
+   long val;
+
+   if (amdgpu_in_reset(adev))
+   return -EPERM;
+   if (adev->in_suspend && !adev->in_runpm)
+   return -EPERM;
+
+   count = min(count, sizeof(tmp_buf));
+   memcpy(tmp_buf, buf, count);
+   tmp_buf[count - 1] = '\0';
+   tmp = tmp_buf;
+
+   tmp = skip_spaces(tmp);
+   if (strncmp(tmp, "soc_pstate", strlen("soc_pstate")) == 0) {
+   policy_type = PP_PM_POLICY_SOC_PSTATE;
+   tmp += strlen("soc_pstate");
+   } else {
+   return -EINVAL;
+   }
+
+   tmp = skip_spaces(tmp);
+   while ((param = strsep(, delimiter))) {
+   if (!strlen(param)) {
+   tmp = skip_spaces(tmp);
+   continue;
+   }
+   ret = kstrtol(param, 0, );
+   if (ret)
+   return -EINVAL;
+   num_params++;
+   if (num_params > 1)
+   

[PATCH 0/9] Add PM policy interfaces

2024-03-13 Thread Lijo Lazar
This series adds APIs to get the supported PM policies and also set them. A PM
policy type is a predefined policy type supported by an SOC and each policy may
define two or more levels to choose from. A user can select the appropriate
level through amdgpu_dpm_set_pm_policy() or through sysfs node pm_policy. Based
on the specific PM functional area, multiple PM policies may be defined for an
SOC For ex: a policy may be defined to set the right setting for XGMI per link
power down feature and another may be defined to select the SOC Pstate
preferences.
 
Presently, XGMI PLPD and SOC Pstate policy types are supported. It also removes
the legacy sysfs interface to set XGMI PLPD as it is not used any client like
SMI tool.


Lijo Lazar (9):
  drm/amd/pm: Add support for DPM policies
  drm/amd/pm: Update PMFW messages for SMUv13.0.6
  drm/amd/pm: Add support to select pstate policy
  drm/amd/pm: Add xgmi plpd policy to pm_policy
  drm/amd/pm: Add xgmi plpd to SMU v13.0.6 pm_policy
  drm/amd/pm: Add xgmi plpd to aldebaran pm_policy
  drm/amd/pm: Add xgmi plpd to arcturus pm_policy
  drm/amd/pm: Remove legacy interface for xgmi plpd
  drm/amd/pm: Remove unused interface to set plpd

 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c  |   4 +-
 .../gpu/drm/amd/include/kgd_pp_interface.h|  17 ++
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   |  32 ++--
 drivers/gpu/drm/amd/pm/amdgpu_pm.c|  87 ++
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |   9 +-
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 108 +++--
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  37 -
 .../pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h  |   3 +-
 drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |   3 +-
 .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c |  64 +---
 .../drm/amd/pm/swsmu/smu13/aldebaran_ppt.c|  59 ---
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c|   2 +
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c  | 153 +-
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c|  57 +++
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h|   2 +
 15 files changed, 468 insertions(+), 169 deletions(-)

-- 
2.25.1



[PATCH 2/2] drm/amdgpu/pm: Check the validity of overdiver power limit

2024-03-13 Thread Ma Jun
Check the validity of overdriver power limit before using it.

Signed-off-by: Ma Jun 
Suggested-by: Lazar Lijo 
Suggested-by: Alex Deucher 
---
 .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c | 11 +
 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   |  9 
 .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 23 +++
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c  | 10 
 .../drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c  | 10 
 5 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
index a406372e79d8..40ba7227cca5 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
@@ -1285,6 +1285,7 @@ static int arcturus_get_power_limit(struct smu_context 
*smu,
 {
struct smu_11_0_powerplay_table *powerplay_table =
(struct smu_11_0_powerplay_table 
*)smu->smu_table.power_play_table;
+   struct smu_11_0_overdrive_table *od_settings = smu->od_settings;
PPTable_t *pptable = smu->smu_table.driver_pptable;
uint32_t power_limit, od_percent_upper = 0, od_percent_lower = 0;
 
@@ -1304,12 +1305,14 @@ static int arcturus_get_power_limit(struct smu_context 
*smu,
*default_power_limit = power_limit;
 
if (powerplay_table) {
-   if (smu->od_enabled)
+   if (smu->od_enabled &&
+   od_settings->cap[SMU_11_0_ODCAP_POWER_LIMIT]) {
od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   else
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   } else if (od_settings->cap[SMU_11_0_ODCAP_POWER_LIMIT]) {
od_percent_upper = 0;
-
-   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   }
}
 
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
(default power: %d)\n",
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index 65bba5fc2335..836b1df79928 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -2358,12 +2358,13 @@ static int navi10_get_power_limit(struct smu_context 
*smu,
 
if (powerplay_table) {
if (smu->od_enabled &&
-   navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_POWER_LIMIT))
+   navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_POWER_LIMIT)) {
od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   else
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   } else if (navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_POWER_LIMIT)) {
od_percent_upper = 0;
-
-   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   }
}
 
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
(default power: %d)\n",
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index 9371e6e79c56..86315e1cf81b 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -617,6 +617,12 @@ static uint32_t 
sienna_cichlid_get_throttler_status_locked(struct smu_context *s
return throttler_status;
 }
 
+static bool sienna_cichlid_is_od_feature_supported(struct 
smu_11_0_7_overdrive_table *od_table,
+  enum 
SMU_11_0_7_ODFEATURE_CAP cap)
+{
+   return od_table->cap[cap];
+}
+
 static int sienna_cichlid_get_power_limit(struct smu_context *smu,
  uint32_t *current_power_limit,
  uint32_t *default_power_limit,
@@ -625,6 +631,7 @@ static int sienna_cichlid_get_power_limit(struct 
smu_context *smu,
 {
struct smu_11_0_7_powerplay_table *powerplay_table =
(struct smu_11_0_7_powerplay_table 
*)smu->smu_table.power_play_table;
+   struct smu_11_0_7_overdrive_table *od_settings = smu->od_settings;
uint32_t power_limit, 

[PATCH 1/2] drm/amdgpu/pm: Fix NULL pointer dereference when get power limit

2024-03-13 Thread Ma Jun
Because powerplay_table initialization is skipped under
sriov case, We check and set default lower and upper OD
value if powerplay_table is NULL.

Fixes: 7968e9748fbb ("drm/amdgpu/pm: Fix the power1_min_cap value")
Signed-off-by: Ma Jun 
Reported-by: Yin Zhenguo 
Suggested-by: Lazar Lijo 
Suggested-by: Alex Deucher 
---
 .../gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c| 14 --
 drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c  | 16 +---
 .../drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c  | 14 --
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 14 --
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c | 14 --
 5 files changed, 41 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
index 1d96eb274d72..a406372e79d8 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
@@ -1286,7 +1286,7 @@ static int arcturus_get_power_limit(struct smu_context 
*smu,
struct smu_11_0_powerplay_table *powerplay_table =
(struct smu_11_0_powerplay_table 
*)smu->smu_table.power_play_table;
PPTable_t *pptable = smu->smu_table.driver_pptable;
-   uint32_t power_limit, od_percent_upper, od_percent_lower;
+   uint32_t power_limit, od_percent_upper = 0, od_percent_lower = 0;
 
if (smu_v11_0_get_current_power_limit(smu, _limit)) {
/* the last hope to figure out the ppt limit */
@@ -1303,12 +1303,14 @@ static int arcturus_get_power_limit(struct smu_context 
*smu,
if (default_power_limit)
*default_power_limit = power_limit;
 
-   if (smu->od_enabled)
-   od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   else
-   od_percent_upper = 0;
+   if (powerplay_table) {
+   if (smu->od_enabled)
+   od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   else
+   od_percent_upper = 0;
 
-   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   }
 
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
(default power: %d)\n",
od_percent_upper, 
od_percent_lower, power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
index ed189a3878eb..65bba5fc2335 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
@@ -2339,7 +2339,7 @@ static int navi10_get_power_limit(struct smu_context *smu,
(struct smu_11_0_powerplay_table 
*)smu->smu_table.power_play_table;
struct smu_11_0_overdrive_table *od_settings = smu->od_settings;
PPTable_t *pptable = smu->smu_table.driver_pptable;
-   uint32_t power_limit, od_percent_upper, od_percent_lower;
+   uint32_t power_limit, od_percent_upper = 0, od_percent_lower = 0;
 
if (smu_v11_0_get_current_power_limit(smu, _limit)) {
/* the last hope to figure out the ppt limit */
@@ -2356,13 +2356,15 @@ static int navi10_get_power_limit(struct smu_context 
*smu,
if (default_power_limit)
*default_power_limit = power_limit;
 
-   if (smu->od_enabled &&
-   navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_POWER_LIMIT))
-   od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-   else
-   od_percent_upper = 0;
+   if (powerplay_table) {
+   if (smu->od_enabled &&
+   navi10_od_feature_is_supported(od_settings, 
SMU_11_0_ODCAP_POWER_LIMIT))
+   od_percent_upper = 
le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   else
+   od_percent_upper = 0;
 
-   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   od_percent_lower = 
le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
+   }
 
dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d 
(default power: %d)\n",
od_percent_upper, od_percent_lower, 
power_limit);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index a405424dd699..9371e6e79c56 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ 

RE: [PATCH] drm/amdgpu: Do a basic health check before reset

2024-03-13 Thread Kamal, Asad
[AMD Official Use Only - General]

Reviewed-by: Asad Kamal 

Thanks & Regards
Asad

-Original Message-
From: Lazar, Lijo 
Sent: Wednesday, March 13, 2024 3:12 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking ; Deucher, Alexander 
; Kamal, Asad 
Subject: [PATCH] drm/amdgpu: Do a basic health check before reset

Check if the device is present in the bus before trying to recover. It could be 
that device itself is lost from the bus in some hang situations.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1e9454e6e4cb..b37113b79483 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5536,6 +5536,23 @@ static inline void 
amdgpu_device_stop_pending_resets(struct amdgpu_device *adev)

 }

+static int amdgpu_device_health_check(struct list_head
+*device_list_handle) {
+   struct amdgpu_device *tmp_adev;
+   int ret = 0;
+   u32 status;
+
+   list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
+   pci_read_config_dword(tmp_adev->pdev, PCI_COMMAND, );
+   if (PCI_POSSIBLE_ERROR(status)) {
+   dev_err(tmp_adev->dev, "device lost from bus!");
+   ret = -ENODEV;
+   }
+   }
+
+   return ret;
+}
+
 /**
  * amdgpu_device_gpu_recover - reset the asic and recover scheduler
  *
@@ -5607,6 +5624,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
device_list_handle = _list;
}

+   if (!amdgpu_sriov_vf(adev)) {
+   r = amdgpu_device_health_check(device_list_handle);
+   if (r)
+   goto end_reset;
+   }
+
/* We need to lock reset domain only once both for XGMI and single 
device */
tmp_adev = list_first_entry(device_list_handle, struct amdgpu_device,
reset_list);
@@ -5772,6 +5795,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
reset_list);
amdgpu_device_unlock_reset_domain(tmp_adev->reset_domain);

+end_reset:
if (hive) {
mutex_unlock(>hive_lock);
amdgpu_put_xgmi_hive(hive);
--
2.25.1



[PATCH] drm/amdgpu: Do a basic health check before reset

2024-03-13 Thread Lijo Lazar
Check if the device is present in the bus before trying to recover. It
could be that device itself is lost from the bus in some hang
situations.

Signed-off-by: Lijo Lazar 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1e9454e6e4cb..b37113b79483 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5536,6 +5536,23 @@ static inline void 
amdgpu_device_stop_pending_resets(struct amdgpu_device *adev)
 
 }
 
+static int amdgpu_device_health_check(struct list_head *device_list_handle)
+{
+   struct amdgpu_device *tmp_adev;
+   int ret = 0;
+   u32 status;
+
+   list_for_each_entry(tmp_adev, device_list_handle, reset_list) {
+   pci_read_config_dword(tmp_adev->pdev, PCI_COMMAND, );
+   if (PCI_POSSIBLE_ERROR(status)) {
+   dev_err(tmp_adev->dev, "device lost from bus!");
+   ret = -ENODEV;
+   }
+   }
+
+   return ret;
+}
+
 /**
  * amdgpu_device_gpu_recover - reset the asic and recover scheduler
  *
@@ -5607,6 +5624,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
device_list_handle = _list;
}
 
+   if (!amdgpu_sriov_vf(adev)) {
+   r = amdgpu_device_health_check(device_list_handle);
+   if (r)
+   goto end_reset;
+   }
+
/* We need to lock reset domain only once both for XGMI and single 
device */
tmp_adev = list_first_entry(device_list_handle, struct amdgpu_device,
reset_list);
@@ -5772,6 +5795,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
reset_list);
amdgpu_device_unlock_reset_domain(tmp_adev->reset_domain);
 
+end_reset:
if (hive) {
mutex_unlock(>hive_lock);
amdgpu_put_xgmi_hive(hive);
-- 
2.25.1



[PATCH] drm/amdgpu: correct the KGQ fallback message

2024-03-13 Thread Prike Liang
Fix the KGQ fallback function name, as this will
help differentiate the failure in the KCQ enablement.

Signed-off-by: Prike Liang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 4835d6d899e7..d9dc5485 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -686,7 +686,7 @@ int amdgpu_gfx_enable_kgq(struct amdgpu_device *adev, int 
xcc_id)
r = amdgpu_ring_test_helper(kiq_ring);
spin_unlock(>ring_lock);
if (r)
-   DRM_ERROR("KCQ enable failed\n");
+   DRM_ERROR("KGQ enable failed\n");
 
return r;
 }
-- 
2.34.1



Re: [PATCH] drm/amdgpu: cleanup unused variable

2024-03-13 Thread Christian König

Am 12.03.24 um 16:31 schrieb Shashank Sharma:

This patch removes an unused input variable in the MES
doorbell function.

Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Shashank Sharma 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 10 +++---
  1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index 89ac50405e25..7615daf89ba5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -40,7 +40,6 @@ int amdgpu_mes_doorbell_process_slice(struct amdgpu_device 
*adev)
  }
  
  static int amdgpu_mes_kernel_doorbell_get(struct amdgpu_device *adev,

-struct amdgpu_mes_process *process,
 int ip_type, uint64_t *doorbell_index)
  {
unsigned int offset, found;
@@ -65,7 +64,6 @@ static int amdgpu_mes_kernel_doorbell_get(struct 
amdgpu_device *adev,
  }
  
  static void amdgpu_mes_kernel_doorbell_free(struct amdgpu_device *adev,

-  struct amdgpu_mes_process *process,
   uint32_t doorbell_index)
  {
unsigned int old, rel_index;
@@ -623,7 +621,7 @@ int amdgpu_mes_add_hw_queue(struct amdgpu_device *adev, int 
gang_id,
*queue_id = queue->queue_id = r;
  
  	/* allocate a doorbell index for the queue */

-   r = amdgpu_mes_kernel_doorbell_get(adev, gang->process,
+   r = amdgpu_mes_kernel_doorbell_get(adev,
  qprops->queue_type,
  >doorbell_off);
if (r)
@@ -681,8 +679,7 @@ int amdgpu_mes_add_hw_queue(struct amdgpu_device *adev, int 
gang_id,
return 0;
  
  clean_up_doorbell:

-   amdgpu_mes_kernel_doorbell_free(adev, gang->process,
-  qprops->doorbell_off);
+   amdgpu_mes_kernel_doorbell_free(adev, qprops->doorbell_off);
  clean_up_queue_id:
spin_lock_irqsave(>mes.queue_id_lock, flags);
idr_remove(>mes.queue_id_idr, queue->queue_id);
@@ -736,8 +733,7 @@ int amdgpu_mes_remove_hw_queue(struct amdgpu_device *adev, 
int queue_id)
  queue_id);
  
  	list_del(>list);

-   amdgpu_mes_kernel_doorbell_free(adev, gang->process,
-  queue->doorbell_off);
+   amdgpu_mes_kernel_doorbell_free(adev, queue->doorbell_off);
amdgpu_mes_unlock(>mes);
  
  	amdgpu_mes_queue_free_mqd(queue);




[PATCH 3/3] drm/amdgpu: make reset method configurable for RAS poison

2024-03-13 Thread Tao Zhou
Each RAS block has different requirement for gpu reset in poison
consumption handling.
Add support for mmhub RAS poison consumption handling.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c   | 14 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h   |  4 ++--
 .../gpu/drm/amd/amdkfd/kfd_int_process_v10.c  | 20 ++-
 .../gpu/drm/amd/amdkfd/kfd_int_process_v9.c   | 20 ++-
 7 files changed, 42 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 9687650b0fe3..262d20167039 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -760,7 +760,7 @@ bool amdgpu_amdkfd_is_fed(struct amdgpu_device *adev)
 }
 
 void amdgpu_amdkfd_ras_poison_consumption_handler(struct amdgpu_device *adev,
-   enum amdgpu_ras_block block, bool reset)
+   enum amdgpu_ras_block block, uint32_t reset)
 {
amdgpu_umc_poison_handler(adev, block, reset);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 03bf20e0e3da..ad50c7bbc326 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -400,7 +400,7 @@ void amdgpu_amdkfd_debug_mem_fence(struct amdgpu_device 
*adev);
 int amdgpu_amdkfd_get_tile_config(struct amdgpu_device *adev,
struct tile_config *config);
 void amdgpu_amdkfd_ras_poison_consumption_handler(struct amdgpu_device *adev,
-   enum amdgpu_ras_block block, bool reset);
+   enum amdgpu_ras_block block, uint32_t reset);
 bool amdgpu_amdkfd_is_fed(struct amdgpu_device *adev);
 bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device *adev, struct kgd_mem 
*mem);
 void amdgpu_amdkfd_block_mmu_notifications(void *p);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index e32a186c2de1..58fe7bebdf1b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2045,7 +2045,7 @@ static void 
amdgpu_ras_interrupt_poison_consumption_handler(struct ras_manager *
}
}
 
-   amdgpu_umc_poison_handler(adev, obj->head.block, false);
+   amdgpu_umc_poison_handler(adev, obj->head.block, 0);
 
if (block_obj->hw_ops && block_obj->hw_ops->handle_poison_consumption)
poison_stat = 
block_obj->hw_ops->handle_poison_consumption(adev);
@@ -2698,7 +2698,7 @@ static int amdgpu_ras_page_retirement_thread(void *param)
atomic_dec(>page_retirement_req_cnt);
 
amdgpu_umc_bad_page_polling_timeout(adev,
-   false, MAX_UMC_POISON_POLLING_TIME_ASYNC);
+   0, MAX_UMC_POISON_POLLING_TIME_ASYNC);
}
 
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
index 20436f81856a..2c02585dcbff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
@@ -186,9 +186,7 @@ static int amdgpu_umc_do_page_retirement(struct 
amdgpu_device *adev,
amdgpu_umc_handle_bad_pages(adev, ras_error_status);
 
if (err_data->ue_count && reset) {
-   /* use mode-2 reset for poison consumption */
-   if (!entry)
-   con->gpu_reset_flags |= 
AMDGPU_RAS_GPU_RESET_MODE2_RESET;
+   con->gpu_reset_flags |= reset;
amdgpu_ras_reset_gpu(adev);
}
 
@@ -196,7 +194,7 @@ static int amdgpu_umc_do_page_retirement(struct 
amdgpu_device *adev,
 }
 
 int amdgpu_umc_bad_page_polling_timeout(struct amdgpu_device *adev,
-   bool reset, uint32_t timeout_ms)
+   uint32_t reset, uint32_t timeout_ms)
 {
struct ras_err_data err_data;
struct ras_common_if head = {
@@ -238,8 +236,7 @@ int amdgpu_umc_bad_page_polling_timeout(struct 
amdgpu_device *adev,
if (reset) {
struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
 
-   /* use mode-2 reset for poison consumption */
-   con->gpu_reset_flags |= AMDGPU_RAS_GPU_RESET_MODE2_RESET;
+   con->gpu_reset_flags |= reset;
amdgpu_ras_reset_gpu(adev);
}
 
@@ -247,7 +244,7 @@ int amdgpu_umc_bad_page_polling_timeout(struct 
amdgpu_device *adev,
 }
 
 int amdgpu_umc_poison_handler(struct amdgpu_device *adev,
-   enum amdgpu_ras_block block, bool reset)
+   enum amdgpu_ras_block block, uint32_t reset)
 {
int ret = AMDGPU_RAS_SUCCESS;
 
@@ -311,7 +308,8 @@ int amdgpu_umc_process_ras_data_cb(struct amdgpu_device 

[PATCH 2/3] drm/amdgpu: support utcl2 RAS poison query for mmhub

2024-03-13 Thread Tao Zhou
Support the query for both gfxhub and mmhub, also replace
xcc_id with hub_inst.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c  | 17 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h  |  2 +-
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   |  3 +--
 .../gpu/drm/amd/amdkfd/kfd_int_process_v10.c| 17 +++--
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 17 +++--
 5 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index fa958cbc603a..9687650b0fe3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -782,12 +782,19 @@ int amdgpu_amdkfd_send_close_event_drain_irq(struct 
amdgpu_device *adev,
 }
 
 bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev,
-   int xcc_id)
+   int hub_inst, int hub_type)
 {
-   if (adev->gfxhub.funcs->query_utcl2_poison_status)
-   return adev->gfxhub.funcs->query_utcl2_poison_status(adev, 
xcc_id);
-   else
-   return false;
+   if (!hub_type) {
+   if (adev->gfxhub.funcs->query_utcl2_poison_status)
+   return 
adev->gfxhub.funcs->query_utcl2_poison_status(adev, hub_inst);
+   else
+   return false;
+   } else {
+   if (adev->mmhub.funcs->query_utcl2_poison_status)
+   return 
adev->mmhub.funcs->query_utcl2_poison_status(adev, hub_inst);
+   else
+   return false;
+   }
 }
 
 int amdgpu_amdkfd_check_and_lock_kfd(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 73b7fa7c5116..03bf20e0e3da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -406,7 +406,7 @@ bool amdgpu_amdkfd_bo_mapped_to_dev(struct amdgpu_device 
*adev, struct kgd_mem *
 void amdgpu_amdkfd_block_mmu_notifications(void *p);
 int amdgpu_amdkfd_criu_resume(void *p);
 bool amdgpu_amdkfd_ras_query_utcl2_poison_status(struct amdgpu_device *adev,
-   int xcc_id);
+   int hub_inst, int hub_type);
 int amdgpu_amdkfd_reserve_mem_limit(struct amdgpu_device *adev,
uint64_t size, u32 alloc_flag, int8_t xcp_id);
 void amdgpu_amdkfd_unreserve_mem_limit(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index fb19b88e5522..d615d0fc2c6a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -672,8 +672,7 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device 
*adev,
 
/* for gfx fed error, kfd will handle it, return directly */
if (fed && amdgpu_ras_is_poison_mode_supported(adev) &&
-   (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(9, 4, 2)) &&
-   (vmhub < AMDGPU_MMHUB0_START))
+   (amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(9, 4, 2)))
return 0;
 
WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
index a8e76287dde0..650da18b0d87 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c
@@ -369,18 +369,23 @@ static void event_interrupt_wq_v10(struct kfd_node *dev,
uint16_t ring_id = SOC15_RING_ID_FROM_IH_ENTRY(ih_ring_entry);
uint32_t node_id = SOC15_NODEID_FROM_IH_ENTRY(ih_ring_entry);
uint32_t vmid_type = 
SOC15_VMID_TYPE_FROM_IH_ENTRY(ih_ring_entry);
-   int xcc_id = 0;
+   int hub_inst = 0;
struct kfd_hsa_memory_exception_data exception_data;
 
+   /* gfxhub */
if (!vmid_type && dev->adev->gfx.funcs->ih_node_to_logical_xcc) 
{
-   xcc_id = 
dev->adev->gfx.funcs->ih_node_to_logical_xcc(dev->adev,
+   hub_inst = 
dev->adev->gfx.funcs->ih_node_to_logical_xcc(dev->adev,
node_id);
-   if (xcc_id < 0)
-   xcc_id = 0;
+   if (hub_inst < 0)
+   hub_inst = 0;
}
 
-   if (client_id == SOC15_IH_CLIENTID_UTCL2 && !vmid_type &&
-   amdgpu_amdkfd_ras_query_utcl2_poison_status(dev->adev, 
xcc_id)) {
+   /* mmhub */
+   if (vmid_type && client_id == SOC15_IH_CLIENTID_VMC)
+   hub_inst = node_id / 4;
+
+   if (amdgpu_amdkfd_ras_query_utcl2_poison_status(dev->adev,
+   hub_inst, vmid_type)) {

[PATCH 1/3] drm/amdgpu: add utcl2 RAS poison query for mmhub

2024-03-13 Thread Tao Zhou
Add it for mmhub v1.8.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c   | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h
index 1ca9d4ed8063..95d676ee207f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mmhub.h
@@ -63,6 +63,8 @@ struct amdgpu_mmhub_funcs {
uint64_t page_table_base);
void (*update_power_gating)(struct amdgpu_device *adev,
 bool enable);
+   bool (*query_utcl2_poison_status)(struct amdgpu_device *adev,
+   int hub_inst);
 };
 
 struct amdgpu_mmhub {
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
index c0fc44cdd658..b7aa05dbef86 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
@@ -559,6 +559,20 @@ static void mmhub_v1_8_get_clockgating(struct 
amdgpu_device *adev, u64 *flags)
 
 }
 
+static bool mmhub_v1_8_query_utcl2_poison_status(struct amdgpu_device *adev,
+   int hub_inst)
+{
+   u32 fed, status;
+
+   status = RREG32_SOC15(MMHUB, hub_inst, 
regVM_L2_PROTECTION_FAULT_STATUS);
+   fed = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, FED);
+   /* reset page fault status */
+   WREG32_P(SOC15_REG_OFFSET(MMHUB, hub_inst,
+   regVM_L2_PROTECTION_FAULT_STATUS), 1, ~1);
+
+   return fed;
+}
+
 const struct amdgpu_mmhub_funcs mmhub_v1_8_funcs = {
.get_fb_location = mmhub_v1_8_get_fb_location,
.init = mmhub_v1_8_init,
@@ -568,6 +582,7 @@ const struct amdgpu_mmhub_funcs mmhub_v1_8_funcs = {
.setup_vm_pt_regs = mmhub_v1_8_setup_vm_pt_regs,
.set_clockgating = mmhub_v1_8_set_clockgating,
.get_clockgating = mmhub_v1_8_get_clockgating,
+   .query_utcl2_poison_status = mmhub_v1_8_query_utcl2_poison_status,
 };
 
 static const struct amdgpu_ras_err_status_reg_entry mmhub_v1_8_ce_reg_list[] = 
{
-- 
2.34.1



Re: [PATCH] drm/amdgpu/: Remove bo_create_kernel_at path from virt page

2024-03-13 Thread Christian König

Am 12.03.24 um 18:50 schrieb Victor Skvortsov:

Use amdgpu_vram_mgr to reserve bad page ranges.
Reserved ranges will be freed by amdgpu_vram_mgr_fini()
Delete bo_create path as it is redundant.

Suggested-by: Christian König 
Signed-off-by: Victor Skvortsov 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 55 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  2 -
  2 files changed, 3 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 7a4eae36778a..2a20714b9c16 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -244,7 +244,6 @@ static int amdgpu_virt_init_ras_err_handler_data(struct 
amdgpu_device *adev)
 */
unsigned int align_space = 512;
void *bps = NULL;
-   struct amdgpu_bo **bps_bo = NULL;
  
  	*data = kmalloc(sizeof(struct amdgpu_virt_ras_err_handler_data), GFP_KERNEL);

if (!*data)
@@ -254,12 +253,7 @@ static int amdgpu_virt_init_ras_err_handler_data(struct 
amdgpu_device *adev)
if (!bps)
goto bps_failure;
  
-	bps_bo = kmalloc_array(align_space, sizeof(*(*data)->bps_bo), GFP_KERNEL);

-   if (!bps_bo)
-   goto bps_bo_failure;
-
(*data)->bps = bps;
-   (*data)->bps_bo = bps_bo;
(*data)->count = 0;
(*data)->last_reserved = 0;
  
@@ -267,34 +261,12 @@ static int amdgpu_virt_init_ras_err_handler_data(struct amdgpu_device *adev)
  
  	return 0;
  
-bps_bo_failure:

-   kfree(bps);
  bps_failure:
kfree(*data);
  data_failure:
return -ENOMEM;
  }
  
-static void amdgpu_virt_ras_release_bp(struct amdgpu_device *adev)

-{
-   struct amdgpu_virt *virt = >virt;
-   struct amdgpu_virt_ras_err_handler_data *data = virt->virt_eh_data;
-   struct amdgpu_bo *bo;
-   int i;
-
-   if (!data)
-   return;
-
-   for (i = data->last_reserved - 1; i >= 0; i--) {
-   bo = data->bps_bo[i];
-   if (bo) {
-   amdgpu_bo_free_kernel(, NULL, NULL);
-   data->bps_bo[i] = bo;
-   }
-   data->last_reserved = i;
-   }
-}
-
  void amdgpu_virt_release_ras_err_handler_data(struct amdgpu_device *adev)
  {
struct amdgpu_virt *virt = >virt;
@@ -305,10 +277,7 @@ void amdgpu_virt_release_ras_err_handler_data(struct 
amdgpu_device *adev)
if (!data)
return;
  
-	amdgpu_virt_ras_release_bp(adev);

-
kfree(data->bps);
-   kfree(data->bps_bo);
kfree(data);
virt->virt_eh_data = NULL;
  }
@@ -330,9 +299,6 @@ static void amdgpu_virt_ras_reserve_bps(struct 
amdgpu_device *adev)
  {
struct amdgpu_virt *virt = >virt;
struct amdgpu_virt_ras_err_handler_data *data = virt->virt_eh_data;
-   struct amdgpu_vram_mgr *mgr = >mman.vram_mgr;
-   struct ttm_resource_manager *man = >manager;
-   struct amdgpu_bo *bo = NULL;
uint64_t bp;
int i;
  
@@ -341,26 +307,11 @@ static void amdgpu_virt_ras_reserve_bps(struct amdgpu_device *adev)
  
  	for (i = data->last_reserved; i < data->count; i++) {

bp = data->bps[i].retired_page;
+   if (amdgpu_vram_mgr_reserve_range(>mman.vram_mgr,
+   bp << AMDGPU_GPU_PAGE_SHIFT, AMDGPU_GPU_PAGE_SIZE))
+   DRM_DEBUG("RAS WARN: reserve vram for retired page %llx 
fail\n", bp);
  
-		/* There are two cases of reserve error should be ignored:

-* 1) a ras bad page has been allocated (used by someone);
-* 2) a ras bad page has been reserved (duplicate error 
injection
-*for one page);
-*/
-   if  (ttm_resource_manager_used(man)) {
-   amdgpu_vram_mgr_reserve_range(>mman.vram_mgr,
-   bp << AMDGPU_GPU_PAGE_SHIFT,
-   AMDGPU_GPU_PAGE_SIZE);
-   data->bps_bo[i] = NULL;
-   } else {
-   if (amdgpu_bo_create_kernel_at(adev, bp << 
AMDGPU_GPU_PAGE_SHIFT,
-   AMDGPU_GPU_PAGE_SIZE,
-   , NULL))
-   DRM_DEBUG("RAS WARN: reserve vram for retired page 
%llx fail\n", bp);
-   data->bps_bo[i] = bo;
-   }
data->last_reserved = i + 1;
-   bo = NULL;
}
  }
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h

index 3f59b7b5523f..15599951e7b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
@@ -224,8 +224,6 @@ struct amdgim_vf2pf_info_v2 {
  struct amdgpu_virt_ras_err_handler_data {
/* point to bad page records array */
struct 

Re: [RFC PATCH v4 22/42] drm/vkms: Use s32 for internal color pipeline precision

2024-03-13 Thread Pekka Paalanen
On Mon, 26 Feb 2024 16:10:36 -0500
Harry Wentland  wrote:

> Certain operations require us to preserve values below 0.0 and
> above 1.0 (0x0 and 0x respectively in 16 bpc unorm). One
> such operation is a BT709 encoding operation followed by its
> decoding operation, or the reverse.
> 
> We'll use s32 values as intermediate in and outputs of our
> color operations, for the operations where it matters.
> 
> For now this won't apply to LUT operations. We might want to
> update those to work on s32 as well, but it's unclear how
> that should work for unorm LUT definitions. We'll revisit
> that once we add LUT + CTM tests.
> 
> In order to allow for this we'll also invert the nesting of our
> colorop processing loops. We now use the pixel iteration loop
> on the outside and the colorop iteration on the inside.

I always go through the same thought process:

- putting the pixel iteration loop outermost is going to be horrible
  for performance

- maybe should turn the temporary line buffer to argb_s32 for all of
  VKMS

- that's going to be a lot of memory traffic... maybe your way is not
  horrible for performance

- maybe processing pixel by pixel is good, if you can prepare the
  operation in advance, so you don't need to analyze colorops again and
  again on each pixel

- eh, maybe it's not a gain after all, needs benchmarking

My comment on patch 17 was like the step 0 of this train of thought. I
just always get the same comments in my mind when seeing the same code
again.


Thanks,
pq

> 
> v4:
>  - Clarify that we're packing 16-bit UNORM into s32, not
>converting values to a different representation (Pekka)
> 
> v3:
>  - Use new colorop->next pointer
> 
> Signed-off-by: Harry Wentland 
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c | 57 +---
>  drivers/gpu/drm/vkms/vkms_drv.h  |  4 ++
>  2 files changed, 48 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
> b/drivers/gpu/drm/vkms/vkms_composer.c
> index 25b786b49db0..d2101fa55aa3 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -164,7 +164,7 @@ static void apply_lut(const struct vkms_crtc_state 
> *crtc_state, struct line_buff
>   }
>  }
>  
> -static void apply_colorop(struct pixel_argb_u16 *pixel, struct drm_colorop 
> *colorop)
> +static void apply_colorop(struct pixel_argb_s32 *pixel, struct drm_colorop 
> *colorop)
>  {
>   /* TODO is this right? */
>   struct drm_colorop_state *colorop_state = colorop->state;
> @@ -191,25 +191,56 @@ static void apply_colorop(struct pixel_argb_u16 *pixel, 
> struct drm_colorop *colo
>  
>  static void pre_blend_color_transform(const struct vkms_plane_state 
> *plane_state, struct line_buffer *output_buffer)
>  {
> - struct drm_colorop *colorop = plane_state->base.base.color_pipeline;
> + struct drm_colorop *colorop;
> + struct pixel_argb_s32 pixel;
>  
> - while (colorop) {
> - struct drm_colorop_state *colorop_state;
> + for (size_t x = 0; x < output_buffer->n_pixels; x++) {
>  
> - if (!colorop)
> - return;
> + /*
> +  * Some operations, such as applying a BT709 encoding matrix,
> +  * followed by a decoding matrix, require that we preserve
> +  * values above 1.0 and below 0.0 until the end of the pipeline.
> +  *
> +  * Pack the 16-bit UNORM values into s32 to give us head-room to
> +  * avoid clipping until we're at the end of the pipeline. Clip
> +  * intentionally at the end of the pipeline before packing
> +  * UNORM values back into u16.
> +  */
> + pixel.a = output_buffer->pixels[x].a;
> + pixel.r = output_buffer->pixels[x].r;
> + pixel.g = output_buffer->pixels[x].g;
> + pixel.b = output_buffer->pixels[x].b;
>  
> - /* TODO this is probably wrong */
> - colorop_state = colorop->state;
> + colorop = plane_state->base.base.color_pipeline;
> + while (colorop) {
> + struct drm_colorop_state *colorop_state;
>  
> - if (!colorop_state)
> - return;
> + if (!colorop)
> + return;
> +
> + /* TODO this is probably wrong */
> + colorop_state = colorop->state;
> +
> + if (!colorop_state)
> + return;
>  
> - for (size_t x = 0; x < output_buffer->n_pixels; x++)
>   if (!colorop_state->bypass)
> - apply_colorop(_buffer->pixels[x], 
> colorop);
> + apply_colorop(, colorop);
>  
> - colorop = colorop->next;
> + colorop = colorop->next;
> + }
> +
> + /* clamp pixel */
> + 

Re: [RFC PATCH v4 17/42] drm/vkms: Add enumerated 1D curve colorop

2024-03-13 Thread Pekka Paalanen
On Mon, 26 Feb 2024 16:10:31 -0500
Harry Wentland  wrote:

> This patch introduces a VKMS color pipeline that includes two
> drm_colorops for named transfer functions. For now the only ones
> supported are sRGB EOTF, sRGB Inverse EOTF, and a Linear TF.
> We will expand this in the future but I don't want to do so
> without accompanying IGT tests.
> 
> We introduce a new vkms_luts.c file that hard-codes sRGB EOTF,
> sRGB Inverse EOTF, and a linear EOTF LUT. These have been
> generated with 256 entries each as IGT is currently testing
> only 8 bpc surfaces. We will likely need higher precision
> but I'm reluctant to make that change without clear indication
> that we need it. We'll revisit and, if necessary, regenerate
> the LUTs when we have IGT tests for higher precision buffers.
> 
> v4:
>  - Drop _tf_ from color_pipeline init function
>  - Pass supported TFs into colorop init
>  - Create bypass pipeline in DRM helper (Pekka)
> 
> v2:
>  - Add commit description
>  - Fix sRGB EOTF LUT definition
>  - Add linear and sRGB inverse EOTF LUTs
> 
> Signed-off-by: Harry Wentland 
> Signed-off-by: Alex Hung 
> ---
>  drivers/gpu/drm/vkms/Makefile|   4 +-
>  drivers/gpu/drm/vkms/vkms_colorop.c  |  70 +++
>  drivers/gpu/drm/vkms/vkms_composer.c |  45 ++
>  drivers/gpu/drm/vkms/vkms_drv.h  |   4 +
>  drivers/gpu/drm/vkms/vkms_luts.c | 802 +++
>  drivers/gpu/drm/vkms/vkms_luts.h |  12 +
>  drivers/gpu/drm/vkms/vkms_plane.c|   2 +
>  7 files changed, 938 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/vkms/vkms_colorop.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_luts.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_luts.h

...

> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c 
> b/drivers/gpu/drm/vkms/vkms_composer.c
> index b90e446d5954..9493bdb1ba3f 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -12,6 +12,7 @@
>  #include 
>  
>  #include "vkms_drv.h"
> +#include "vkms_luts.h"
>  
>  static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>  {
> @@ -163,6 +164,47 @@ static void apply_lut(const struct vkms_crtc_state 
> *crtc_state, struct line_buff
>   }
>  }
>  
> +static void pre_blend_color_transform(const struct vkms_plane_state 
> *plane_state, struct line_buffer *output_buffer)
> +{
> + struct drm_colorop *colorop = plane_state->base.base.color_pipeline;
> +
> + while (colorop) {

I think this would be easier to read if you used

for (; colorop; colorop = colorop->next) {

and

> + struct drm_colorop_state *colorop_state;
> +
> + if (!colorop)
> + return;
> +
> + /* TODO this is probably wrong */
> + colorop_state = colorop->state;
> +
> + if (!colorop_state)
> + return;

if (colorop_state->bypass)
continue;

Something about 'switch (colorop->type)' to pick a function pointer to
call, but hard to see at this point of the series how that would work.

However, you can pick between srgb_inv_eotf and srgb_eotf already here.
Then inside the loop you can just call one set of
apply_lut_to_channel_value() and not need conditionals and avoid
indentation levels.

> +
> + for (size_t x = 0; x < output_buffer->n_pixels; x++) {
> + struct pixel_argb_u16 *pixel = 
> _buffer->pixels[x];
> +
> + if (colorop->type == DRM_COLOROP_1D_CURVE &&
> + colorop_state->bypass == false) {
> + switch (colorop_state->curve_1d_type) {
> + case DRM_COLOROP_1D_CURVE_SRGB_INV_EOTF:
> + pixel->r = 
> apply_lut_to_channel_value(_inv_eotf, pixel->r, LUT_RED);
> + pixel->g = 
> apply_lut_to_channel_value(_inv_eotf, pixel->g, LUT_GREEN);
> + pixel->b = 
> apply_lut_to_channel_value(_inv_eotf, pixel->b, LUT_BLUE);
> + break;
> + case DRM_COLOROP_1D_CURVE_SRGB_EOTF:
> + default:
> + pixel->r = 
> apply_lut_to_channel_value(_eotf, pixel->r, LUT_RED);
> + pixel->g = 
> apply_lut_to_channel_value(_eotf, pixel->g, LUT_GREEN);
> + pixel->b = 
> apply_lut_to_channel_value(_eotf, pixel->b, LUT_BLUE);
> + break;
> + }
> + }

else { aaargh_unknown_colorop(); }

> + }
> +
> + colorop = colorop->next;
> + }
> +}

...

> diff --git a/drivers/gpu/drm/vkms/vkms_luts.c 
> b/drivers/gpu/drm/vkms/vkms_luts.c
> new file mode 100644
> index 

Re: [PATCH v2] drm/amdgpu: Clear the hotplug interrupt ack bit before hpd initialization

2024-03-13 Thread Qiang Ma
On Wed, 31 Jan 2024 15:57:03 +0800
Qiang Ma  wrote:

Hello everyone, please help review this patch.

  Qiang Ma

> Problem:
> The computer in the bios initialization process, unplug the HDMI
> display, wait until the system up, plug in the HDMI display, did not
> enter the hotplug interrupt function, the display is not bright.
> 
> Fix:
> After the above problem occurs, and the hpd ack interrupt bit is 1,
> the interrupt should be cleared during hpd_init initialization so that
> when the driver is ready, it can respond to the hpd interrupt
> normally.
> 
> Signed-off-by: Qiang Ma 
> ---
> v2:
>  - Remove unused variable 'tmp'
>  - Fixed function spelling errors
>  
> drivers/gpu/drm/amd/amdgpu/dce_v10_0.c |  2 ++
>  drivers/gpu/drm/amd/amdgpu/dce_v11_0.c |  2 ++
>  drivers/gpu/drm/amd/amdgpu/dce_v6_0.c  | 22 ++
>  drivers/gpu/drm/amd/amdgpu/dce_v8_0.c  | 22 ++
>  4 files changed, 40 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c index
> bb666cb7522e..12a8ba929a72 100644 ---
> a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c +++
> b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c @@ -51,6 +51,7 @@
>  
>  static void dce_v10_0_set_display_funcs(struct amdgpu_device *adev);
>  static void dce_v10_0_set_irq_funcs(struct amdgpu_device *adev);
> +static void dce_v10_0_hpd_int_ack(struct amdgpu_device *adev, int
> hpd); 
>  static const u32 crtc_offsets[] = {
>   CRTC0_REGISTER_OFFSET,
> @@ -363,6 +364,7 @@ static void dce_v10_0_hpd_init(struct
> amdgpu_device *adev) AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
>   WREG32(mmDC_HPD_TOGGLE_FILT_CNTL +
> hpd_offsets[amdgpu_connector->hpd.hpd], tmp); 
> + dce_v10_0_hpd_int_ack(adev,
> amdgpu_connector->hpd.hpd); dce_v10_0_hpd_set_polarity(adev,
> amdgpu_connector->hpd.hpd); amdgpu_irq_get(adev, >hpd_irq,
>  amdgpu_connector->hpd.hpd);
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c index
> 7af277f61cca..745e4fdffade 100644 ---
> a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c +++
> b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c @@ -51,6 +51,7 @@
>  
>  static void dce_v11_0_set_display_funcs(struct amdgpu_device *adev);
>  static void dce_v11_0_set_irq_funcs(struct amdgpu_device *adev);
> +static void dce_v11_0_hpd_int_ack(struct amdgpu_device *adev, int
> hpd); 
>  static const u32 crtc_offsets[] =
>  {
> @@ -387,6 +388,7 @@ static void dce_v11_0_hpd_init(struct
> amdgpu_device *adev) AMDGPU_HPD_DISCONNECT_INT_DELAY_IN_MS);
>   WREG32(mmDC_HPD_TOGGLE_FILT_CNTL +
> hpd_offsets[amdgpu_connector->hpd.hpd], tmp); 
> + dce_v11_0_hpd_int_ack(adev,
> amdgpu_connector->hpd.hpd); dce_v11_0_hpd_set_polarity(adev,
> amdgpu_connector->hpd.hpd); amdgpu_irq_get(adev, >hpd_irq,
> amdgpu_connector->hpd.hpd); }
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c index
> 143efc37a17f..28c4a735716b 100644 ---
> a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c +++
> b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c @@ -272,6 +272,21 @@ static
> void dce_v6_0_hpd_set_polarity(struct amdgpu_device *adev,
> WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp); }
>  
> +static void dce_v6_0_hpd_int_ack(struct amdgpu_device *adev,
> +  int hpd)
> +{
> + u32 tmp;
> +
> + if (hpd >= adev->mode_info.num_hpd) {
> + DRM_DEBUG("invalid hdp %d\n", hpd);
> + return;
> + }
> +
> + tmp = RREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd]);
> + tmp |= DC_HPD1_INT_CONTROL__DC_HPD1_INT_ACK_MASK;
> + WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd], tmp);
> +}
> +
>  /**
>   * dce_v6_0_hpd_init - hpd setup callback.
>   *
> @@ -311,6 +326,7 @@ static void dce_v6_0_hpd_init(struct
> amdgpu_device *adev) continue;
>   }
>  
> + dce_v6_0_hpd_int_ack(adev,
> amdgpu_connector->hpd.hpd); dce_v6_0_hpd_set_polarity(adev,
> amdgpu_connector->hpd.hpd); amdgpu_irq_get(adev, >hpd_irq,
> amdgpu_connector->hpd.hpd); }
> @@ -3088,7 +3104,7 @@ static int dce_v6_0_hpd_irq(struct
> amdgpu_device *adev, struct amdgpu_irq_src *source,
>   struct amdgpu_iv_entry *entry)
>  {
> - uint32_t disp_int, mask, tmp;
> + uint32_t disp_int, mask;
>   unsigned hpd;
>  
>   if (entry->src_data[0] >= adev->mode_info.num_hpd) {
> @@ -3101,9 +3117,7 @@ static int dce_v6_0_hpd_irq(struct
> amdgpu_device *adev, mask = interrupt_status_offsets[hpd].hpd;
>  
>   if (disp_int & mask) {
> - tmp = RREG32(mmDC_HPD1_INT_CONTROL +
> hpd_offsets[hpd]);
> - tmp |= DC_HPD1_INT_CONTROL__DC_HPD1_INT_ACK_MASK;
> - WREG32(mmDC_HPD1_INT_CONTROL + hpd_offsets[hpd],
> tmp);
> + dce_v6_0_hpd_int_ack(adev, hpd);
>   schedule_delayed_work(>hotplug_work, 0);
>   DRM_DEBUG("IH: HPD%d\n", hpd + 1);
>   }
> diff 

Re: [RFC PATCH v4 10/42] drm/colorop: Add TYPE property

2024-03-13 Thread Pekka Paalanen
On Mon, 26 Feb 2024 16:10:24 -0500
Harry Wentland  wrote:

> Add a read-only TYPE property. The TYPE specifies the colorop
> type, such as enumerated curve, 1D LUT, CTM, 3D LUT, PWL LUT,
> etc.
> 
> v4:
>  - Use enum property for TYPE (Pekka)
> 
> v3:
>  - Make TYPE a range property
>  - Move enum drm_colorop_type to uapi header
>  - Fix drm_get_colorop_type_name description
> 
> For now we're only introducing an enumerated 1D LUT type to
> illustrate the concept.
> 
> Signed-off-by: Harry Wentland 
> ---
>  drivers/gpu/drm/drm_atomic.c  |  4 +--
>  drivers/gpu/drm/drm_atomic_uapi.c |  8 +-
>  drivers/gpu/drm/drm_colorop.c | 44 ++-
>  include/drm/drm_colorop.h | 17 +++-
>  include/uapi/drm/drm_mode.h   |  4 +++
>  5 files changed, 72 insertions(+), 5 deletions(-)

...

> diff --git a/drivers/gpu/drm/drm_colorop.c b/drivers/gpu/drm/drm_colorop.c
> index a295ab96aee1..3a07a8fe2842 100644
> --- a/drivers/gpu/drm/drm_colorop.c
> +++ b/drivers/gpu/drm/drm_colorop.c
> @@ -32,12 +32,17 @@
>  
>  /* TODO big colorop doc, including properties, etc. */
>  
> +static const struct drm_prop_enum_list drm_colorop_type_enum_list[] = {
> + { DRM_COLOROP_1D_CURVE, "1D Curve" },
> +};
> +
>  /* Init Helpers */
>  
>  int drm_colorop_init(struct drm_device *dev, struct drm_colorop *colorop,
> -  struct drm_plane *plane)
> +  struct drm_plane *plane, enum drm_colorop_type type)
>  {
>   struct drm_mode_config *config = >mode_config;
> + struct drm_property *prop;
>   int ret = 0;
>  
>   ret = drm_mode_object_add(dev, >base, DRM_MODE_OBJECT_COLOROP);
> @@ -46,12 +51,29 @@ int drm_colorop_init(struct drm_device *dev, struct 
> drm_colorop *colorop,
>  
>   colorop->base.properties = >properties;
>   colorop->dev = dev;
> + colorop->type = type;
>   colorop->plane = plane;
>  
>   list_add_tail(>head, >colorop_list);
>   colorop->index = config->num_colorop++;
>  
>   /* add properties */
> +
> + /* type */
> + prop = drm_property_create_enum(dev,
> + DRM_MODE_PROP_IMMUTABLE,
> + "TYPE", drm_colorop_type_enum_list,
> + ARRAY_SIZE(drm_colorop_type_enum_list));

This list here is the list of all values the enum could take, right?
Should it not be just the one value it's going to have? It'll never
change, and it can never be changed.

> +
> + if (!prop)
> + return -ENOMEM;
> +
> + colorop->type_property = prop;
> +
> + drm_object_attach_property(>base,
> +colorop->type_property,
> +colorop->type);
> +
>   return ret;
>  }
>  EXPORT_SYMBOL(drm_colorop_init);

Thanks,
pq


pgpRJes9FP7mu.pgp
Description: OpenPGP digital signature


Re: [RFC PATCH v4 10/42] drm/colorop: Add TYPE property

2024-03-13 Thread Pekka Paalanen
On Tue, 12 Mar 2024 15:15:13 +
Simon Ser  wrote:

> On Tuesday, March 12th, 2024 at 16:02, Pekka Paalanen 
>  wrote:
> 
> > This list here is the list of all values the enum could take, right?
> > Should it not be just the one value it's going to have? It'll never
> > change, and it can never be changed.  
> 
> Listing all possible values is how other properties behave. Example:
> 
> "type" (immutable): enum {Overlay, Primary, Cursor} = Primary

Yeah, that's weird, now that you pointed it out.

Userspace does nothing with the knowledge of what this specific
object's property "could" be since it's immutable. Only the kernel
internals and UAPI docs need the full list, and userspace needs to
hardcode the full list in general so it can recognize each value. But
on the property instance on a specific object, I see no use for the
full list.


Thanks,
pq


pgpWoSj9nkKp3.pgp
Description: OpenPGP digital signature


RE: [PATCH 03/43] drm/amd/display: Enable 2to1 ODM policy for DCN35

2024-03-13 Thread Lin, Wayne
[Public]

Appreciate for the feedback!

Regards,
Wayne
> -Original Message-
> From: Christian König 
> Sent: Tuesday, March 12, 2024 6:21 PM
> To: Lin, Wayne ; amd-gfx@lists.freedesktop.org
> Cc: Wentland, Harry ; Li, Sun peng (Leo)
> ; Siqueira, Rodrigo ;
> Pillai, Aurabindo ; Li, Roman
> ; Gutierrez, Agustin ;
> Chung, ChiaHsuan (Tom) ; Wu, Hersen
> ; Zuo, Jerry 
> Subject: Re: [PATCH 03/43] drm/amd/display: Enable 2to1 ODM policy for
> DCN35
>
> Just another general comment on how to upstream patches.
>
> When publishing a large set of patches it is usually good convention to sort
> them:
> 1. Bug fixes which might even get backported 2. Comment and other non
> function cleanups 3. Functional cleanups 4. New features
>
> One good reason for that is that it usually makes fixes much easier to port to
> older kernel versions, but it also makes things easier to review.
>
> If you are in doubt if a patch set is still fully compiling after re-ordering 
> things
> you can use the command
>
> git rebase -x make base_branch
>
> On your branch and git will run a make between after applying each patch. This
> way you can double check that everything still builds fine.
>
> Working like that is not a must have, but really good practice.
>
> Regards,
> Christian.
>
> Am 12.03.24 um 10:19 schrieb Wayne Lin:
> > From: Rodrigo Siqueira 
> >
> > [Why & How]
> > Enable 2to1 ODM policy for DCN35
> >
> > Acked-by: Wayne Lin 
> > Signed-off-by: Rodrigo Siqueira 
> > ---
> >   drivers/gpu/drm/amd/display/dc/resource/dcn35/dcn35_resource.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git
> > a/drivers/gpu/drm/amd/display/dc/resource/dcn35/dcn35_resource.c
> > b/drivers/gpu/drm/amd/display/dc/resource/dcn35/dcn35_resource.c
> > index 5d52853cac96..a8f4023ff3b1 100644
> > --- a/drivers/gpu/drm/amd/display/dc/resource/dcn35/dcn35_resource.c
> > +++ b/drivers/gpu/drm/amd/display/dc/resource/dcn35/dcn35_resource.c
> > @@ -769,7 +769,7 @@ static const struct dc_debug_options
> debug_defaults_drv = {
> > .support_eDP1_5 = true,
> > .enable_hpo_pg_support = false,
> > .enable_legacy_fast_update = true,
> > -   .enable_single_display_2to1_odm_policy = false,
> > +   .enable_single_display_2to1_odm_policy = true,
> > .disable_idle_power_optimizations = false,
> > .dmcub_emulation = false,
> > .disable_boot_optimizations = false,