[AMD Official Use Only - AMD Internal Distribution Only]

Hi Lijo,

Just ran some tests and yes, I think we should still reflect everything. Even 
with extended peer link command disabled, if we do not reflect, the table will 
only be filled beneath the diagonal (e.g. attached). So I think the patch is 
good as is for that concern.

Let me know if there's anything else you want me to look into.
Thanks,
Will
-----Original Message-----
From: Aitken, Will
Sent: Wednesday, October 29, 2025 9:16 AM
To: Lazar, Lijo <[email protected]>; [email protected]
Cc: Skvortsov, Victor <[email protected]>
Subject: RE: [PATCH 1/3] drm/amdgpu: Refactor sriov xgmi topology filling to 
common code

Hi Lijo,

Thank for reviewing. Good question. I don't think it's necessary since it 
wasn't being done before (the equivalent xgmi_fill function was only called if 
sharing was enabled). I don't think it hurts to do it, but it might be 
redundant (the port num copying at least would be zeros for the other nodes 
since it does not receive extended data). Not sure about the 
is_sharing_enabled. I will run some experiments and add checks to only reflect 
if necessary.

Thanks,
Will

-----Original Message-----
From: Lazar, Lijo <[email protected]>
Sent: Tuesday, October 28, 2025 9:16 AM
To: Aitken, Will <[email protected]>; [email protected]
Cc: Skvortsov, Victor <[email protected]>; Aitken, Will 
<[email protected]>
Subject: Re: [PATCH 1/3] drm/amdgpu: Refactor sriov xgmi topology filling to 
common code



On 10/8/2025 7:02 PM, [email protected] wrote:
> From: Will Aitken <[email protected]>
>
> amdgpu_xgmi_fill_topology_info and psp_xgmi_reflect_topology_info
> perform the same logic of copying topology info of one node to every
> other node in the hive. Instead of having two functions that purport
> to do the same thing, this refactoring moves the logic of the fill
> function to the reflect function and adds reflecting port number info
> as well for complete functionality.
>
> Signed-off-by: Will Aitken <[email protected]>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c  | 19 ++++++++++++-----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 27 ------------------------
>   2 files changed, 14 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 6208a49c9f23..82500ade240d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -1539,6 +1539,7 @@ static void psp_xgmi_reflect_topology_info(struct 
> psp_context *psp,
>       uint64_t src_node_id = psp->adev->gmc.xgmi.node_id;
>       uint64_t dst_node_id = node_info.node_id;
>       uint8_t dst_num_hops = node_info.num_hops;
> +     uint8_t dst_is_sharing_enabled = node_info.is_sharing_enabled;
>       uint8_t dst_num_links = node_info.num_links;
>
>       hive = amdgpu_get_xgmi_hive(psp->adev); @@ -1558,13 +1559,20 @@
> static void psp_xgmi_reflect_topology_info(struct psp_context *psp,
>                               continue;
>
>                       mirror_top_info->nodes[j].num_hops = dst_num_hops;
> -                     /*
> -                      * prevent 0 num_links value re-reflection since 
> reflection
> +                     mirror_top_info->nodes[j].is_sharing_enabled = 
> dst_is_sharing_enabled;
> +                     /* prevent 0 num_links value re-reflection since 
> reflection
>                        * criteria is based on num_hops (direct or indirect).
> -                      *
>                        */
> -                     if (dst_num_links)
> +                     if (dst_num_links) {

Patches look fine. One clarification - do you need to fill this information if 
sharing is disabled?

Thanks,
Lijo

>                               mirror_top_info->nodes[j].num_links = 
> dst_num_links;
> +                             /* swap src and dst due to frame of reference */
> +                             for (int k = 0; k < dst_num_links; k++) {
> +                                     
> mirror_top_info->nodes[j].port_num[k].src_xgmi_port_num =
> +                                             
> node_info.port_num[k].dst_xgmi_port_num;
> +                                     
> mirror_top_info->nodes[j].port_num[k].dst_xgmi_port_num =
> +                                             
> node_info.port_num[k].src_xgmi_port_num;
> +                             }
> +                     }
>
>                       break;
>               }
> @@ -1639,7 +1647,8 @@ int psp_xgmi_get_topology_info(struct psp_context *psp,
>                       amdgpu_ip_version(psp->adev, MP0_HWIP, 0) ==
>                               IP_VERSION(13, 0, 6) ||
>                       amdgpu_ip_version(psp->adev, MP0_HWIP, 0) ==
> -                             IP_VERSION(13, 0, 14);
> +                             IP_VERSION(13, 0, 14) ||
> +                     amdgpu_sriov_vf(psp->adev);
>               bool ta_port_num_support = amdgpu_sriov_vf(psp->adev) ? 0 :
>                               psp->xgmi_context.xgmi_ta_caps & 
> EXTEND_PEER_LINK_INFO_CMD_FLAG;
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> index 1ede308a7c67..2e70b84a8c3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> @@ -958,28 +958,6 @@ static int 
> amdgpu_xgmi_initialize_hive_get_data_partition(struct amdgpu_hive_inf
>       return 0;
>   }
>
> -static void amdgpu_xgmi_fill_topology_info(struct amdgpu_device *adev,
> -     struct amdgpu_device *peer_adev)
> -{
> -     struct psp_xgmi_topology_info *top_info = 
> &adev->psp.xgmi_context.top_info;
> -     struct psp_xgmi_topology_info *peer_info = 
> &peer_adev->psp.xgmi_context.top_info;
> -
> -     for (int i = 0; i < peer_info->num_nodes; i++) {
> -             if (peer_info->nodes[i].node_id == adev->gmc.xgmi.node_id) {
> -                     for (int j = 0; j < top_info->num_nodes; j++) {
> -                             if (top_info->nodes[j].node_id == 
> peer_adev->gmc.xgmi.node_id) {
> -                                     peer_info->nodes[i].num_hops = 
> top_info->nodes[j].num_hops;
> -                                     peer_info->nodes[i].is_sharing_enabled =
> -                                                     
> top_info->nodes[j].is_sharing_enabled;
> -                                     peer_info->nodes[i].num_links =
> -                                                     
> top_info->nodes[j].num_links;
> -                                     return;
> -                             }
> -                     }
> -             }
> -     }
> -}
> -
>   int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
>   {
>       struct psp_xgmi_topology_info *top_info; @@ -1065,11 +1043,6 @@ int
> amdgpu_xgmi_add_device(struct amdgpu_device *adev)
>                               /* To do: continue with some node failed or 
> disable the whole hive*/
>                               goto exit_unlock;
>                       }
> -
> -                     /* fill the topology info for peers instead of getting 
> from PSP */
> -                     list_for_each_entry(tmp_adev, &hive->device_list, 
> gmc.xgmi.head) {
> -                             amdgpu_xgmi_fill_topology_info(adev, tmp_adev);
> -                     }
>               } else {
>                       /* get latest topology info for each device from psp */
>                       list_for_each_entry(tmp_adev, &hive->device_list, 
> gmc.xgmi.head)
> {

Reply via email to