Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/30/2022 11:54 AM, Lazar, Lijo wrote: On 9/30/2022 2:52 PM, Sharma, Shashank wrote: On 9/30/2022 11:13 AM, Lazar, Lijo wrote: On 9/30/2022 2:07 PM, Sharma, Shashank wrote: On 9/30/2022 7:08 AM, Lazar, Lijo wrote: On 9/30/2022 12:02 AM, Alex Deucher wrote: On Thu, Sep 29, 2022 at 10:14 AM Lazar, Lijo wrote: On 9/29/2022 7:30 PM, Sharma, Shashank wrote: On 9/29/2022 3:37 PM, Lazar, Lijo wrote: To be clear your understanding - Nothing is automatic in PMFW. PMFW picks a priority based on the actual mask sent by driver. Assuming lower bits corresponds to highest priority - If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose profile that corresponds to Bit0. If driver sends a mask with Bit4 Bit2 set and rest unset, PMFW will chose profile that corresponds to Bit2. However if driver sends a mask only with a single bit set, it chooses the profile regardless of whatever was the previous profile. t doesn't check if the existing profile > newly requested one. That is the behavior. So if a job send chooses a profile that corresponds to Bit0, driver will send that. Next time if another job chooses a profile that corresponds to Bit1, PMFW will receive that as the new profile and switch to that. It trusts the driver to send the proper workload mask. Hope that gives the picture. Thanks, my understanding is also similar, referring to the core power switch profile function here: amd_powerplay.c::pp_dpm_switch_power_profile() *snip code* hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]); index = fls(hwmgr->workload_mask); index = index <= Workload_Policy_Max ? index - 1 : 0; workload = hwmgr->workload_setting[index]; *snip_code* hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, , 0); Here I can see that the new workload mask is appended into the existing workload mask (not overwritten). So if we keep sending new workload_modes, they would be appended into the workload flags and finally the PM will pick the most aggressive one of all these flags, as per its policy. Actually it's misleading - The path for sienna is - set_power_profile_mode -> sienna_cichlid_set_power_profile_mode This code here is a picking one based on lookup table. workload_type = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_WORKLOAD, smu->power_profile_mode); This is that lookup table - static struct cmn2asic_mapping sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = { WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT, WORKLOAD_PPLIB_DEFAULT_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D, WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING, WORKLOAD_PPLIB_POWER_SAVING_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO, WORKLOAD_PPLIB_VIDEO_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR, WORKLOAD_PPLIB_VR_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE, WORKLOAD_PPLIB_COMPUTE_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM, WORKLOAD_PPLIB_CUSTOM_BIT), }; And this is the place of interaction with PMFW. (1 << workload_type) is the mask being sent. smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask, 1 << workload_type, NULL); In the end, driver implementation expects only one bit to be set. Shashank and I had a discussion about this today. I think there are a few thing we can do to handle this better: 1. Set a flag that if the user changes the default via sysfs that overrides any runtime setting via an application since presumably that is what the user wants and we won't change the hint at runtime. 2. Drop the GET API. There's no need for this, the hint is just a hint. Double checked again based on Felix's comments on API definition. Driver decides the priority instead of FW. That way we can still keep Get API. 2. Have the driver arbitrate between the available workload profiles based on the numeric value of the hint (e.g., default < 3D < video < VR < compute) as the higher values are more aggressive in most cases. If requests come in for 3D and compute at the same time, the driver will select compute because it's value is highest. Each hint type would be ref counted so we'd know what state to be in every time we go to set the state. If all of the clients requesting compute go away, and only 3D requestors remain, we can switch to 3D. If all refcounts go to 0, we go back to default. This will not require any change to the current workload API in the SMU code. Since PM layer decides priority, refcount can be kept at powerplay and swsmu layer instead of any higher level API. User API may keep something like req_power_profile (for any logging/debug purpose) for the job preference. No, I think there has been enough confusion around this implementation so far, we will implement this just as Alex/Felix suggested: - No change will be done in
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/30/2022 2:52 PM, Sharma, Shashank wrote: On 9/30/2022 11:13 AM, Lazar, Lijo wrote: On 9/30/2022 2:07 PM, Sharma, Shashank wrote: On 9/30/2022 7:08 AM, Lazar, Lijo wrote: On 9/30/2022 12:02 AM, Alex Deucher wrote: On Thu, Sep 29, 2022 at 10:14 AM Lazar, Lijo wrote: On 9/29/2022 7:30 PM, Sharma, Shashank wrote: On 9/29/2022 3:37 PM, Lazar, Lijo wrote: To be clear your understanding - Nothing is automatic in PMFW. PMFW picks a priority based on the actual mask sent by driver. Assuming lower bits corresponds to highest priority - If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose profile that corresponds to Bit0. If driver sends a mask with Bit4 Bit2 set and rest unset, PMFW will chose profile that corresponds to Bit2. However if driver sends a mask only with a single bit set, it chooses the profile regardless of whatever was the previous profile. t doesn't check if the existing profile > newly requested one. That is the behavior. So if a job send chooses a profile that corresponds to Bit0, driver will send that. Next time if another job chooses a profile that corresponds to Bit1, PMFW will receive that as the new profile and switch to that. It trusts the driver to send the proper workload mask. Hope that gives the picture. Thanks, my understanding is also similar, referring to the core power switch profile function here: amd_powerplay.c::pp_dpm_switch_power_profile() *snip code* hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]); index = fls(hwmgr->workload_mask); index = index <= Workload_Policy_Max ? index - 1 : 0; workload = hwmgr->workload_setting[index]; *snip_code* hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, , 0); Here I can see that the new workload mask is appended into the existing workload mask (not overwritten). So if we keep sending new workload_modes, they would be appended into the workload flags and finally the PM will pick the most aggressive one of all these flags, as per its policy. Actually it's misleading - The path for sienna is - set_power_profile_mode -> sienna_cichlid_set_power_profile_mode This code here is a picking one based on lookup table. workload_type = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_WORKLOAD, smu->power_profile_mode); This is that lookup table - static struct cmn2asic_mapping sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = { WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT, WORKLOAD_PPLIB_DEFAULT_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D, WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING, WORKLOAD_PPLIB_POWER_SAVING_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO, WORKLOAD_PPLIB_VIDEO_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR, WORKLOAD_PPLIB_VR_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE, WORKLOAD_PPLIB_COMPUTE_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM, WORKLOAD_PPLIB_CUSTOM_BIT), }; And this is the place of interaction with PMFW. (1 << workload_type) is the mask being sent. smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask, 1 << workload_type, NULL); In the end, driver implementation expects only one bit to be set. Shashank and I had a discussion about this today. I think there are a few thing we can do to handle this better: 1. Set a flag that if the user changes the default via sysfs that overrides any runtime setting via an application since presumably that is what the user wants and we won't change the hint at runtime. 2. Drop the GET API. There's no need for this, the hint is just a hint. Double checked again based on Felix's comments on API definition. Driver decides the priority instead of FW. That way we can still keep Get API. 2. Have the driver arbitrate between the available workload profiles based on the numeric value of the hint (e.g., default < 3D < video < VR < compute) as the higher values are more aggressive in most cases. If requests come in for 3D and compute at the same time, the driver will select compute because it's value is highest. Each hint type would be ref counted so we'd know what state to be in every time we go to set the state. If all of the clients requesting compute go away, and only 3D requestors remain, we can switch to 3D. If all refcounts go to 0, we go back to default. This will not require any change to the current workload API in the SMU code. Since PM layer decides priority, refcount can be kept at powerplay and swsmu layer instead of any higher level API. User API may keep something like req_power_profile (for any logging/debug purpose) for the job preference. No, I think there has been enough confusion around this implementation so far, we will implement this just as Alex/Felix suggested: - No change will be done in pm/SMU layer. Well, a confusion doesn't justify
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/30/2022 11:13 AM, Lazar, Lijo wrote: On 9/30/2022 2:07 PM, Sharma, Shashank wrote: On 9/30/2022 7:08 AM, Lazar, Lijo wrote: On 9/30/2022 12:02 AM, Alex Deucher wrote: On Thu, Sep 29, 2022 at 10:14 AM Lazar, Lijo wrote: On 9/29/2022 7:30 PM, Sharma, Shashank wrote: On 9/29/2022 3:37 PM, Lazar, Lijo wrote: To be clear your understanding - Nothing is automatic in PMFW. PMFW picks a priority based on the actual mask sent by driver. Assuming lower bits corresponds to highest priority - If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose profile that corresponds to Bit0. If driver sends a mask with Bit4 Bit2 set and rest unset, PMFW will chose profile that corresponds to Bit2. However if driver sends a mask only with a single bit set, it chooses the profile regardless of whatever was the previous profile. t doesn't check if the existing profile > newly requested one. That is the behavior. So if a job send chooses a profile that corresponds to Bit0, driver will send that. Next time if another job chooses a profile that corresponds to Bit1, PMFW will receive that as the new profile and switch to that. It trusts the driver to send the proper workload mask. Hope that gives the picture. Thanks, my understanding is also similar, referring to the core power switch profile function here: amd_powerplay.c::pp_dpm_switch_power_profile() *snip code* hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]); index = fls(hwmgr->workload_mask); index = index <= Workload_Policy_Max ? index - 1 : 0; workload = hwmgr->workload_setting[index]; *snip_code* hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, , 0); Here I can see that the new workload mask is appended into the existing workload mask (not overwritten). So if we keep sending new workload_modes, they would be appended into the workload flags and finally the PM will pick the most aggressive one of all these flags, as per its policy. Actually it's misleading - The path for sienna is - set_power_profile_mode -> sienna_cichlid_set_power_profile_mode This code here is a picking one based on lookup table. workload_type = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_WORKLOAD, smu->power_profile_mode); This is that lookup table - static struct cmn2asic_mapping sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = { WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT, WORKLOAD_PPLIB_DEFAULT_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D, WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING, WORKLOAD_PPLIB_POWER_SAVING_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO, WORKLOAD_PPLIB_VIDEO_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR, WORKLOAD_PPLIB_VR_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE, WORKLOAD_PPLIB_COMPUTE_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM, WORKLOAD_PPLIB_CUSTOM_BIT), }; And this is the place of interaction with PMFW. (1 << workload_type) is the mask being sent. smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask, 1 << workload_type, NULL); In the end, driver implementation expects only one bit to be set. Shashank and I had a discussion about this today. I think there are a few thing we can do to handle this better: 1. Set a flag that if the user changes the default via sysfs that overrides any runtime setting via an application since presumably that is what the user wants and we won't change the hint at runtime. 2. Drop the GET API. There's no need for this, the hint is just a hint. Double checked again based on Felix's comments on API definition. Driver decides the priority instead of FW. That way we can still keep Get API. 2. Have the driver arbitrate between the available workload profiles based on the numeric value of the hint (e.g., default < 3D < video < VR < compute) as the higher values are more aggressive in most cases. If requests come in for 3D and compute at the same time, the driver will select compute because it's value is highest. Each hint type would be ref counted so we'd know what state to be in every time we go to set the state. If all of the clients requesting compute go away, and only 3D requestors remain, we can switch to 3D. If all refcounts go to 0, we go back to default. This will not require any change to the current workload API in the SMU code. Since PM layer decides priority, refcount can be kept at powerplay and swsmu layer instead of any higher level API. User API may keep something like req_power_profile (for any logging/debug purpose) for the job preference. No, I think there has been enough confusion around this implementation so far, we will implement this just as Alex/Felix suggested: - No change will be done in pm/SMU layer. Well, a confusion doesn't justify bad implementation. You could just keep the refcount in
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/30/2022 2:07 PM, Sharma, Shashank wrote: On 9/30/2022 7:08 AM, Lazar, Lijo wrote: On 9/30/2022 12:02 AM, Alex Deucher wrote: On Thu, Sep 29, 2022 at 10:14 AM Lazar, Lijo wrote: On 9/29/2022 7:30 PM, Sharma, Shashank wrote: On 9/29/2022 3:37 PM, Lazar, Lijo wrote: To be clear your understanding - Nothing is automatic in PMFW. PMFW picks a priority based on the actual mask sent by driver. Assuming lower bits corresponds to highest priority - If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose profile that corresponds to Bit0. If driver sends a mask with Bit4 Bit2 set and rest unset, PMFW will chose profile that corresponds to Bit2. However if driver sends a mask only with a single bit set, it chooses the profile regardless of whatever was the previous profile. t doesn't check if the existing profile > newly requested one. That is the behavior. So if a job send chooses a profile that corresponds to Bit0, driver will send that. Next time if another job chooses a profile that corresponds to Bit1, PMFW will receive that as the new profile and switch to that. It trusts the driver to send the proper workload mask. Hope that gives the picture. Thanks, my understanding is also similar, referring to the core power switch profile function here: amd_powerplay.c::pp_dpm_switch_power_profile() *snip code* hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]); index = fls(hwmgr->workload_mask); index = index <= Workload_Policy_Max ? index - 1 : 0; workload = hwmgr->workload_setting[index]; *snip_code* hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, , 0); Here I can see that the new workload mask is appended into the existing workload mask (not overwritten). So if we keep sending new workload_modes, they would be appended into the workload flags and finally the PM will pick the most aggressive one of all these flags, as per its policy. Actually it's misleading - The path for sienna is - set_power_profile_mode -> sienna_cichlid_set_power_profile_mode This code here is a picking one based on lookup table. workload_type = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_WORKLOAD, smu->power_profile_mode); This is that lookup table - static struct cmn2asic_mapping sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = { WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT, WORKLOAD_PPLIB_DEFAULT_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D, WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING, WORKLOAD_PPLIB_POWER_SAVING_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO, WORKLOAD_PPLIB_VIDEO_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR, WORKLOAD_PPLIB_VR_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE, WORKLOAD_PPLIB_COMPUTE_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM, WORKLOAD_PPLIB_CUSTOM_BIT), }; And this is the place of interaction with PMFW. (1 << workload_type) is the mask being sent. smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask, 1 << workload_type, NULL); In the end, driver implementation expects only one bit to be set. Shashank and I had a discussion about this today. I think there are a few thing we can do to handle this better: 1. Set a flag that if the user changes the default via sysfs that overrides any runtime setting via an application since presumably that is what the user wants and we won't change the hint at runtime. 2. Drop the GET API. There's no need for this, the hint is just a hint. Double checked again based on Felix's comments on API definition. Driver decides the priority instead of FW. That way we can still keep Get API. 2. Have the driver arbitrate between the available workload profiles based on the numeric value of the hint (e.g., default < 3D < video < VR < compute) as the higher values are more aggressive in most cases. If requests come in for 3D and compute at the same time, the driver will select compute because it's value is highest. Each hint type would be ref counted so we'd know what state to be in every time we go to set the state. If all of the clients requesting compute go away, and only 3D requestors remain, we can switch to 3D. If all refcounts go to 0, we go back to default. This will not require any change to the current workload API in the SMU code. Since PM layer decides priority, refcount can be kept at powerplay and swsmu layer instead of any higher level API. User API may keep something like req_power_profile (for any logging/debug purpose) for the job preference. No, I think there has been enough confusion around this implementation so far, we will implement this just as Alex/Felix suggested: - No change will be done in pm/SMU layer. Well, a confusion doesn't justify bad implementation. You could just keep the refcount in workload_setting. Another API that uses power
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/30/2022 7:08 AM, Lazar, Lijo wrote: On 9/30/2022 12:02 AM, Alex Deucher wrote: On Thu, Sep 29, 2022 at 10:14 AM Lazar, Lijo wrote: On 9/29/2022 7:30 PM, Sharma, Shashank wrote: On 9/29/2022 3:37 PM, Lazar, Lijo wrote: To be clear your understanding - Nothing is automatic in PMFW. PMFW picks a priority based on the actual mask sent by driver. Assuming lower bits corresponds to highest priority - If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose profile that corresponds to Bit0. If driver sends a mask with Bit4 Bit2 set and rest unset, PMFW will chose profile that corresponds to Bit2. However if driver sends a mask only with a single bit set, it chooses the profile regardless of whatever was the previous profile. t doesn't check if the existing profile > newly requested one. That is the behavior. So if a job send chooses a profile that corresponds to Bit0, driver will send that. Next time if another job chooses a profile that corresponds to Bit1, PMFW will receive that as the new profile and switch to that. It trusts the driver to send the proper workload mask. Hope that gives the picture. Thanks, my understanding is also similar, referring to the core power switch profile function here: amd_powerplay.c::pp_dpm_switch_power_profile() *snip code* hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]); index = fls(hwmgr->workload_mask); index = index <= Workload_Policy_Max ? index - 1 : 0; workload = hwmgr->workload_setting[index]; *snip_code* hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, , 0); Here I can see that the new workload mask is appended into the existing workload mask (not overwritten). So if we keep sending new workload_modes, they would be appended into the workload flags and finally the PM will pick the most aggressive one of all these flags, as per its policy. Actually it's misleading - The path for sienna is - set_power_profile_mode -> sienna_cichlid_set_power_profile_mode This code here is a picking one based on lookup table. workload_type = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_WORKLOAD, smu->power_profile_mode); This is that lookup table - static struct cmn2asic_mapping sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = { WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT, WORKLOAD_PPLIB_DEFAULT_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D, WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING, WORKLOAD_PPLIB_POWER_SAVING_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO, WORKLOAD_PPLIB_VIDEO_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR, WORKLOAD_PPLIB_VR_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE, WORKLOAD_PPLIB_COMPUTE_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM, WORKLOAD_PPLIB_CUSTOM_BIT), }; And this is the place of interaction with PMFW. (1 << workload_type) is the mask being sent. smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask, 1 << workload_type, NULL); In the end, driver implementation expects only one bit to be set. Shashank and I had a discussion about this today. I think there are a few thing we can do to handle this better: 1. Set a flag that if the user changes the default via sysfs that overrides any runtime setting via an application since presumably that is what the user wants and we won't change the hint at runtime. 2. Drop the GET API. There's no need for this, the hint is just a hint. Double checked again based on Felix's comments on API definition. Driver decides the priority instead of FW. That way we can still keep Get API. 2. Have the driver arbitrate between the available workload profiles based on the numeric value of the hint (e.g., default < 3D < video < VR < compute) as the higher values are more aggressive in most cases. If requests come in for 3D and compute at the same time, the driver will select compute because it's value is highest. Each hint type would be ref counted so we'd know what state to be in every time we go to set the state. If all of the clients requesting compute go away, and only 3D requestors remain, we can switch to 3D. If all refcounts go to 0, we go back to default. This will not require any change to the current workload API in the SMU code. Since PM layer decides priority, refcount can be kept at powerplay and swsmu layer instead of any higher level API. User API may keep something like req_power_profile (for any logging/debug purpose) for the job preference. No, I think there has been enough confusion around this implementation so far, we will implement this just as Alex/Felix suggested: - No change will be done in pm/SMU layer. - The amdgpu_context_workload layer will keep the ref_counting and user_workload_hint management, and it will just call and consume the pm_switch_workload profile() like any other client. - We will
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/30/2022 12:02 AM, Alex Deucher wrote: On Thu, Sep 29, 2022 at 10:14 AM Lazar, Lijo wrote: On 9/29/2022 7:30 PM, Sharma, Shashank wrote: On 9/29/2022 3:37 PM, Lazar, Lijo wrote: To be clear your understanding - Nothing is automatic in PMFW. PMFW picks a priority based on the actual mask sent by driver. Assuming lower bits corresponds to highest priority - If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose profile that corresponds to Bit0. If driver sends a mask with Bit4 Bit2 set and rest unset, PMFW will chose profile that corresponds to Bit2. However if driver sends a mask only with a single bit set, it chooses the profile regardless of whatever was the previous profile. t doesn't check if the existing profile > newly requested one. That is the behavior. So if a job send chooses a profile that corresponds to Bit0, driver will send that. Next time if another job chooses a profile that corresponds to Bit1, PMFW will receive that as the new profile and switch to that. It trusts the driver to send the proper workload mask. Hope that gives the picture. Thanks, my understanding is also similar, referring to the core power switch profile function here: amd_powerplay.c::pp_dpm_switch_power_profile() *snip code* hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]); index = fls(hwmgr->workload_mask); index = index <= Workload_Policy_Max ? index - 1 : 0; workload = hwmgr->workload_setting[index]; *snip_code* hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, , 0); Here I can see that the new workload mask is appended into the existing workload mask (not overwritten). So if we keep sending new workload_modes, they would be appended into the workload flags and finally the PM will pick the most aggressive one of all these flags, as per its policy. Actually it's misleading - The path for sienna is - set_power_profile_mode -> sienna_cichlid_set_power_profile_mode This code here is a picking one based on lookup table. workload_type = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_WORKLOAD, smu->power_profile_mode); This is that lookup table - static struct cmn2asic_mapping sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = { WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT, WORKLOAD_PPLIB_DEFAULT_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D, WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING, WORKLOAD_PPLIB_POWER_SAVING_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO, WORKLOAD_PPLIB_VIDEO_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR, WORKLOAD_PPLIB_VR_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE, WORKLOAD_PPLIB_COMPUTE_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM, WORKLOAD_PPLIB_CUSTOM_BIT), }; And this is the place of interaction with PMFW. (1 << workload_type) is the mask being sent. smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask, 1 << workload_type, NULL); In the end, driver implementation expects only one bit to be set. Shashank and I had a discussion about this today. I think there are a few thing we can do to handle this better: 1. Set a flag that if the user changes the default via sysfs that overrides any runtime setting via an application since presumably that is what the user wants and we won't change the hint at runtime. 2. Drop the GET API. There's no need for this, the hint is just a hint. Double checked again based on Felix's comments on API definition. Driver decides the priority instead of FW. That way we can still keep Get API. 2. Have the driver arbitrate between the available workload profiles based on the numeric value of the hint (e.g., default < 3D < video < VR < compute) as the higher values are more aggressive in most cases. If requests come in for 3D and compute at the same time, the driver will select compute because it's value is highest. Each hint type would be ref counted so we'd know what state to be in every time we go to set the state. If all of the clients requesting compute go away, and only 3D requestors remain, we can switch to 3D. If all refcounts go to 0, we go back to default. This will not require any change to the current workload API in the SMU code. Since PM layer decides priority, refcount can be kept at powerplay and swsmu layer instead of any higher level API. User API may keep something like req_power_profile (for any logging/debug purpose) for the job preference. Thanks, Lijo Alex Thanks, Lijo Now, when we have a single workload: -> Job1: requests profile P1 via UAPI, ref count = 1 -> driver sends flags for p1 -> PM FW applies profile P1 -> Job executes in profile P1 -> Job goes to reset function, ref_count = 0, -> Power profile resets Now, we have conflicts only when we see multiple workloads (Job1 and Job 2) -> Job1: requests profile P1 via UAPI, ref count = 1 ->
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/29/2022 11:37 PM, Felix Kuehling wrote: On 2022-09-29 07:10, Lazar, Lijo wrote: On 9/29/2022 2:18 PM, Sharma, Shashank wrote: On 9/28/2022 11:51 PM, Alex Deucher wrote: On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank wrote: On 9/27/2022 10:40 PM, Alex Deucher wrote: On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank wrote: On 9/27/2022 5:23 PM, Felix Kuehling wrote: Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. OK, I missed the reference counter. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE I think this won't work. As I understand it, the amdgpu_dpm_switch_power_profile enables and disables individual profiles. Disabling the 3D profile doesn't disable the compute profile at the same time. I think you'll need one refcount per profile. Regards, Felix Thanks, This is exactly what I was looking for, I think Alex's initial idea was around it, but I was under the assumption that there is only one HW profile in SMU which keeps on getting overwritten. This can solve our problems, as I can create an array of reference counters, and will disable only the profile whose reference counter goes 0. It's been a while since I paged any of this code into my head, but I believe the actual workload message in the SMU is a mask where you can specify multiple workload types at the same time and the SMU will arbitrate between them internally. E.g., the most aggressive one will be selected out of the ones specified. I think in the driver we just set one bit at a time using the current interface. It might be better to change the interface and just ref count the hint types and then when we call the set function look at the ref counts for each hint type and set the mask as appropriate. Alex Hey Alex, Thanks for your comment, if that is the case, this current patch series works straight forward, and no changes would be required. Please let me know if my understanding is correct: Assumption: Order of aggression: 3D > Media > Compute - Job 1: Requests mode compute: PP changed to compute, ref count 1 - Job 2: Requests mode media: PP changed to media, ref count 2 - Job 3: requests mode 3D: PP changed to 3D, ref count 3 - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0, PP still 3D - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0, PP still 3D - Job 2 finishes, downs ref count to 0, PP changed to NONE, In this way, every job will be operating in the Power profile of desired aggression or higher, and this API guarantees the execution at-least in the desired power profile. I'm not entirely sure on the relative levels of aggression, but I believe the SMU priorities them by index. E.g. #define WORKLOAD_PPLIB_DEFAULT_BIT 0 #define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1 #define WORKLOAD_PPLIB_POWER_SAVING_BIT 2 #define WORKLOAD_PPLIB_VIDEO_BIT 3 #define WORKLOAD_PPLIB_VR_BIT 4 #define WORKLOAD_PPLIB_COMPUTE_BIT 5 #define
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On Thu, Sep 29, 2022 at 10:14 AM Lazar, Lijo wrote: > > > > On 9/29/2022 7:30 PM, Sharma, Shashank wrote: > > > > > > On 9/29/2022 3:37 PM, Lazar, Lijo wrote: > >> To be clear your understanding - > >> > >> Nothing is automatic in PMFW. PMFW picks a priority based on the > >> actual mask sent by driver. > >> > >> Assuming lower bits corresponds to highest priority - > >> > >> If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose > >> profile that corresponds to Bit0. If driver sends a mask with Bit4 > >> Bit2 set and rest unset, PMFW will chose profile that corresponds to > >> Bit2. However if driver sends a mask only with a single bit set, it > >> chooses the profile regardless of whatever was the previous profile. t > >> doesn't check if the existing profile > newly requested one. That is > >> the behavior. > >> > >> So if a job send chooses a profile that corresponds to Bit0, driver > >> will send that. Next time if another job chooses a profile that > >> corresponds to Bit1, PMFW will receive that as the new profile and > >> switch to that. It trusts the driver to send the proper workload mask. > >> > >> Hope that gives the picture. > >> > > > > > > Thanks, my understanding is also similar, referring to the core power > > switch profile function here: > > amd_powerplay.c::pp_dpm_switch_power_profile() > > *snip code* > > hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]); > > index = fls(hwmgr->workload_mask); > > index = index <= Workload_Policy_Max ? index - 1 : 0; > > workload = hwmgr->workload_setting[index]; > > *snip_code* > > hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, , 0); > > > > Here I can see that the new workload mask is appended into the existing > > workload mask (not overwritten). So if we keep sending new > > workload_modes, they would be appended into the workload flags and > > finally the PM will pick the most aggressive one of all these flags, as > > per its policy. > > > > Actually it's misleading - > > The path for sienna is - > set_power_profile_mode -> sienna_cichlid_set_power_profile_mode > > > This code here is a picking one based on lookup table. > > workload_type = smu_cmn_to_asic_specific_index(smu, > > CMN2ASIC_MAPPING_WORKLOAD, > > smu->power_profile_mode); > > This is that lookup table - > > static struct cmn2asic_mapping > sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = { > WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT, > WORKLOAD_PPLIB_DEFAULT_BIT), > WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D, > WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT), > WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING, > WORKLOAD_PPLIB_POWER_SAVING_BIT), > WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO, > WORKLOAD_PPLIB_VIDEO_BIT), > WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR, > WORKLOAD_PPLIB_VR_BIT), > WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE, > WORKLOAD_PPLIB_COMPUTE_BIT), > WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM, > WORKLOAD_PPLIB_CUSTOM_BIT), > }; > > > And this is the place of interaction with PMFW. (1 << workload_type) is > the mask being sent. > > smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask, > 1 << workload_type, NULL); > > In the end, driver implementation expects only one bit to be set. Shashank and I had a discussion about this today. I think there are a few thing we can do to handle this better: 1. Set a flag that if the user changes the default via sysfs that overrides any runtime setting via an application since presumably that is what the user wants and we won't change the hint at runtime. 2. Drop the GET API. There's no need for this, the hint is just a hint. 2. Have the driver arbitrate between the available workload profiles based on the numeric value of the hint (e.g., default < 3D < video < VR < compute) as the higher values are more aggressive in most cases. If requests come in for 3D and compute at the same time, the driver will select compute because it's value is highest. Each hint type would be ref counted so we'd know what state to be in every time we go to set the state. If all of the clients requesting compute go away, and only 3D requestors remain, we can switch to 3D. If all refcounts go to 0, we go back to default. This will not require any change to the current workload API in the SMU code. Alex > > Thanks, > Lijo > > > Now, when we have a single workload: > > -> Job1: requests profile P1 via UAPI, ref count = 1 > > -> driver sends flags for p1 > > -> PM FW applies profile P1 > > -> Job executes in profile P1 > > -> Job goes to reset function, ref_count = 0, > > -> Power profile resets > > > > Now, we have conflicts only when we see multiple workloads (Job1 and Job 2) > > -> Job1: requests profile P1 via UAPI, ref count = 1 > > -> driver sends flags for p1 > > -> PM FW applies profile P1 > > -> Job executes in profile P1 > > -> Job2: requests profile P2 via UAPI, refcount = 2 > > ->
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 2022-09-29 07:10, Lazar, Lijo wrote: On 9/29/2022 2:18 PM, Sharma, Shashank wrote: On 9/28/2022 11:51 PM, Alex Deucher wrote: On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank wrote: On 9/27/2022 10:40 PM, Alex Deucher wrote: On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank wrote: On 9/27/2022 5:23 PM, Felix Kuehling wrote: Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. OK, I missed the reference counter. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE I think this won't work. As I understand it, the amdgpu_dpm_switch_power_profile enables and disables individual profiles. Disabling the 3D profile doesn't disable the compute profile at the same time. I think you'll need one refcount per profile. Regards, Felix Thanks, This is exactly what I was looking for, I think Alex's initial idea was around it, but I was under the assumption that there is only one HW profile in SMU which keeps on getting overwritten. This can solve our problems, as I can create an array of reference counters, and will disable only the profile whose reference counter goes 0. It's been a while since I paged any of this code into my head, but I believe the actual workload message in the SMU is a mask where you can specify multiple workload types at the same time and the SMU will arbitrate between them internally. E.g., the most aggressive one will be selected out of the ones specified. I think in the driver we just set one bit at a time using the current interface. It might be better to change the interface and just ref count the hint types and then when we call the set function look at the ref counts for each hint type and set the mask as appropriate. Alex Hey Alex, Thanks for your comment, if that is the case, this current patch series works straight forward, and no changes would be required. Please let me know if my understanding is correct: Assumption: Order of aggression: 3D > Media > Compute - Job 1: Requests mode compute: PP changed to compute, ref count 1 - Job 2: Requests mode media: PP changed to media, ref count 2 - Job 3: requests mode 3D: PP changed to 3D, ref count 3 - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0, PP still 3D - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0, PP still 3D - Job 2 finishes, downs ref count to 0, PP changed to NONE, In this way, every job will be operating in the Power profile of desired aggression or higher, and this API guarantees the execution at-least in the desired power profile. I'm not entirely sure on the relative levels of aggression, but I believe the SMU priorities them by index. E.g. #define WORKLOAD_PPLIB_DEFAULT_BIT 0 #define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1 #define WORKLOAD_PPLIB_POWER_SAVING_BIT 2 #define WORKLOAD_PPLIB_VIDEO_BIT 3 #define WORKLOAD_PPLIB_VR_BIT 4 #define WORKLOAD_PPLIB_COMPUTE_BIT 5 #define WORKLOAD_PPLIB_CUSTOM_BIT 6 3D < video < VR < compute <
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/29/2022 4:14 PM, Lazar, Lijo wrote: On 9/29/2022 7:30 PM, Sharma, Shashank wrote: On 9/29/2022 3:37 PM, Lazar, Lijo wrote: To be clear your understanding - Nothing is automatic in PMFW. PMFW picks a priority based on the actual mask sent by driver. Assuming lower bits corresponds to highest priority - If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose profile that corresponds to Bit0. If driver sends a mask with Bit4 Bit2 set and rest unset, PMFW will chose profile that corresponds to Bit2. However if driver sends a mask only with a single bit set, it chooses the profile regardless of whatever was the previous profile. t doesn't check if the existing profile > newly requested one. That is the behavior. So if a job send chooses a profile that corresponds to Bit0, driver will send that. Next time if another job chooses a profile that corresponds to Bit1, PMFW will receive that as the new profile and switch to that. It trusts the driver to send the proper workload mask. Hope that gives the picture. Thanks, my understanding is also similar, referring to the core power switch profile function here: amd_powerplay.c::pp_dpm_switch_power_profile() *snip code* hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]); index = fls(hwmgr->workload_mask); index = index <= Workload_Policy_Max ? index - 1 : 0; workload = hwmgr->workload_setting[index]; *snip_code* hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, , 0); Here I can see that the new workload mask is appended into the existing workload mask (not overwritten). So if we keep sending new workload_modes, they would be appended into the workload flags and finally the PM will pick the most aggressive one of all these flags, as per its policy. Actually it's misleading - The path for sienna is - set_power_profile_mode -> sienna_cichlid_set_power_profile_mode This code here is a picking one based on lookup table. workload_type = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_WORKLOAD, smu->power_profile_mode); This is that lookup table - static struct cmn2asic_mapping sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = { WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT, WORKLOAD_PPLIB_DEFAULT_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D, WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING, WORKLOAD_PPLIB_POWER_SAVING_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO, WORKLOAD_PPLIB_VIDEO_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR, WORKLOAD_PPLIB_VR_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE, WORKLOAD_PPLIB_COMPUTE_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM, WORKLOAD_PPLIB_CUSTOM_BIT), }; And this is the place of interaction with PMFW. (1 << workload_type) is the mask being sent. smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask, 1 << workload_type, NULL); In the end, driver implementation expects only one bit to be set. Well this seems like a bug here in the core functions, because the powerplay layer is doing the right thing by appending the workload flags keeping in mind that a profile_change can be requested while one profile is active, but the core functions are actually ignoring those flags. This brings us to look into actual PM FW expectations. If it expects only one flag to be set in the power_mode change message, we don't need to bother about this anymore. But if it can handle more than one flag but the core driver implementation is blocking it, we will have to fix that as well. @Alex: How can we get more information on this ? - Shashank Thanks, Lijo Now, when we have a single workload: -> Job1: requests profile P1 via UAPI, ref count = 1 -> driver sends flags for p1 -> PM FW applies profile P1 -> Job executes in profile P1 -> Job goes to reset function, ref_count = 0, -> Power profile resets Now, we have conflicts only when we see multiple workloads (Job1 and Job 2) -> Job1: requests profile P1 via UAPI, ref count = 1 -> driver sends flags for p1 -> PM FW applies profile P1 -> Job executes in profile P1 -> Job2: requests profile P2 via UAPI, refcount = 2 -> driver sends flags for (P1|P2) -> PM FW picks the more aggressive of the two (Say P1, stays in P1) -> Job1 goes to reset function, ref_count = 1, job1 does not reset power profile -> Job2 goes to reset function, ref_counter = 2, job 2 resets Power profile -> Power profile resets to None So this state machine looks like if there is only 1 job, it will be executed in desired mode. But if there are multiple, the most aggressive profile will be picked, and every job will be executed in atleast the requested power profile mode or higher. Do you find any problem so far ? - Shashank Thanks, Lijo
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/29/2022 7:30 PM, Sharma, Shashank wrote: On 9/29/2022 3:37 PM, Lazar, Lijo wrote: To be clear your understanding - Nothing is automatic in PMFW. PMFW picks a priority based on the actual mask sent by driver. Assuming lower bits corresponds to highest priority - If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose profile that corresponds to Bit0. If driver sends a mask with Bit4 Bit2 set and rest unset, PMFW will chose profile that corresponds to Bit2. However if driver sends a mask only with a single bit set, it chooses the profile regardless of whatever was the previous profile. t doesn't check if the existing profile > newly requested one. That is the behavior. So if a job send chooses a profile that corresponds to Bit0, driver will send that. Next time if another job chooses a profile that corresponds to Bit1, PMFW will receive that as the new profile and switch to that. It trusts the driver to send the proper workload mask. Hope that gives the picture. Thanks, my understanding is also similar, referring to the core power switch profile function here: amd_powerplay.c::pp_dpm_switch_power_profile() *snip code* hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]); index = fls(hwmgr->workload_mask); index = index <= Workload_Policy_Max ? index - 1 : 0; workload = hwmgr->workload_setting[index]; *snip_code* hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, , 0); Here I can see that the new workload mask is appended into the existing workload mask (not overwritten). So if we keep sending new workload_modes, they would be appended into the workload flags and finally the PM will pick the most aggressive one of all these flags, as per its policy. Actually it's misleading - The path for sienna is - set_power_profile_mode -> sienna_cichlid_set_power_profile_mode This code here is a picking one based on lookup table. workload_type = smu_cmn_to_asic_specific_index(smu, CMN2ASIC_MAPPING_WORKLOAD, smu->power_profile_mode); This is that lookup table - static struct cmn2asic_mapping sienna_cichlid_workload_map[PP_SMC_POWER_PROFILE_COUNT] = { WORKLOAD_MAP(PP_SMC_POWER_PROFILE_BOOTUP_DEFAULT, WORKLOAD_PPLIB_DEFAULT_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_FULLSCREEN3D, WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_POWERSAVING, WORKLOAD_PPLIB_POWER_SAVING_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VIDEO, WORKLOAD_PPLIB_VIDEO_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_VR, WORKLOAD_PPLIB_VR_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_COMPUTE, WORKLOAD_PPLIB_COMPUTE_BIT), WORKLOAD_MAP(PP_SMC_POWER_PROFILE_CUSTOM, WORKLOAD_PPLIB_CUSTOM_BIT), }; And this is the place of interaction with PMFW. (1 << workload_type) is the mask being sent. smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetWorkloadMask, 1 << workload_type, NULL); In the end, driver implementation expects only one bit to be set. Thanks, Lijo Now, when we have a single workload: -> Job1: requests profile P1 via UAPI, ref count = 1 -> driver sends flags for p1 -> PM FW applies profile P1 -> Job executes in profile P1 -> Job goes to reset function, ref_count = 0, -> Power profile resets Now, we have conflicts only when we see multiple workloads (Job1 and Job 2) -> Job1: requests profile P1 via UAPI, ref count = 1 -> driver sends flags for p1 -> PM FW applies profile P1 -> Job executes in profile P1 -> Job2: requests profile P2 via UAPI, refcount = 2 -> driver sends flags for (P1|P2) -> PM FW picks the more aggressive of the two (Say P1, stays in P1) -> Job1 goes to reset function, ref_count = 1, job1 does not reset power profile -> Job2 goes to reset function, ref_counter = 2, job 2 resets Power profile -> Power profile resets to None So this state machine looks like if there is only 1 job, it will be executed in desired mode. But if there are multiple, the most aggressive profile will be picked, and every job will be executed in atleast the requested power profile mode or higher. Do you find any problem so far ? - Shashank Thanks, Lijo
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/29/2022 3:37 PM, Lazar, Lijo wrote: To be clear your understanding - Nothing is automatic in PMFW. PMFW picks a priority based on the actual mask sent by driver. Assuming lower bits corresponds to highest priority - If driver sends a mask with Bit3 and Bit 0 set, PMFW will chose profile that corresponds to Bit0. If driver sends a mask with Bit4 Bit2 set and rest unset, PMFW will chose profile that corresponds to Bit2. However if driver sends a mask only with a single bit set, it chooses the profile regardless of whatever was the previous profile. t doesn't check if the existing profile > newly requested one. That is the behavior. So if a job send chooses a profile that corresponds to Bit0, driver will send that. Next time if another job chooses a profile that corresponds to Bit1, PMFW will receive that as the new profile and switch to that. It trusts the driver to send the proper workload mask. Hope that gives the picture. Thanks, my understanding is also similar, referring to the core power switch profile function here: amd_powerplay.c::pp_dpm_switch_power_profile() *snip code* hwmgr->workload_mask |= (1 << hwmgr->workload_prority[type]); index = fls(hwmgr->workload_mask); index = index <= Workload_Policy_Max ? index - 1 : 0; workload = hwmgr->workload_setting[index]; *snip_code* hwmgr->hwmgr_func->set_power_profile_mode(hwmgr, , 0); Here I can see that the new workload mask is appended into the existing workload mask (not overwritten). So if we keep sending new workload_modes, they would be appended into the workload flags and finally the PM will pick the most aggressive one of all these flags, as per its policy. Now, when we have a single workload: -> Job1: requests profile P1 via UAPI, ref count = 1 -> driver sends flags for p1 -> PM FW applies profile P1 -> Job executes in profile P1 -> Job goes to reset function, ref_count = 0, -> Power profile resets Now, we have conflicts only when we see multiple workloads (Job1 and Job 2) -> Job1: requests profile P1 via UAPI, ref count = 1 -> driver sends flags for p1 -> PM FW applies profile P1 -> Job executes in profile P1 -> Job2: requests profile P2 via UAPI, refcount = 2 -> driver sends flags for (P1|P2) -> PM FW picks the more aggressive of the two (Say P1, stays in P1) -> Job1 goes to reset function, ref_count = 1, job1 does not reset power profile -> Job2 goes to reset function, ref_counter = 2, job 2 resets Power profile -> Power profile resets to None So this state machine looks like if there is only 1 job, it will be executed in desired mode. But if there are multiple, the most aggressive profile will be picked, and every job will be executed in atleast the requested power profile mode or higher. Do you find any problem so far ? - Shashank Thanks, Lijo
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/29/2022 6:50 PM, Sharma, Shashank wrote: On 9/29/2022 1:10 PM, Lazar, Lijo wrote: On 9/29/2022 2:18 PM, Sharma, Shashank wrote: On 9/28/2022 11:51 PM, Alex Deucher wrote: On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank wrote: On 9/27/2022 10:40 PM, Alex Deucher wrote: On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank wrote: On 9/27/2022 5:23 PM, Felix Kuehling wrote: Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. OK, I missed the reference counter. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE I think this won't work. As I understand it, the amdgpu_dpm_switch_power_profile enables and disables individual profiles. Disabling the 3D profile doesn't disable the compute profile at the same time. I think you'll need one refcount per profile. Regards, Felix Thanks, This is exactly what I was looking for, I think Alex's initial idea was around it, but I was under the assumption that there is only one HW profile in SMU which keeps on getting overwritten. This can solve our problems, as I can create an array of reference counters, and will disable only the profile whose reference counter goes 0. It's been a while since I paged any of this code into my head, but I believe the actual workload message in the SMU is a mask where you can specify multiple workload types at the same time and the SMU will arbitrate between them internally. E.g., the most aggressive one will be selected out of the ones specified. I think in the driver we just set one bit at a time using the current interface. It might be better to change the interface and just ref count the hint types and then when we call the set function look at the ref counts for each hint type and set the mask as appropriate. Alex Hey Alex, Thanks for your comment, if that is the case, this current patch series works straight forward, and no changes would be required. Please let me know if my understanding is correct: Assumption: Order of aggression: 3D > Media > Compute - Job 1: Requests mode compute: PP changed to compute, ref count 1 - Job 2: Requests mode media: PP changed to media, ref count 2 - Job 3: requests mode 3D: PP changed to 3D, ref count 3 - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0, PP still 3D - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0, PP still 3D - Job 2 finishes, downs ref count to 0, PP changed to NONE, In this way, every job will be operating in the Power profile of desired aggression or higher, and this API guarantees the execution at-least in the desired power profile. I'm not entirely sure on the relative levels of aggression, but I believe the SMU priorities them by index. E.g. #define WORKLOAD_PPLIB_DEFAULT_BIT 0 #define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1 #define WORKLOAD_PPLIB_POWER_SAVING_BIT 2 #define WORKLOAD_PPLIB_VIDEO_BIT 3 #define WORKLOAD_PPLIB_VR_BIT 4 #define WORKLOAD_PPLIB_COMPUTE_BIT 5 #define
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/29/2022 1:10 PM, Lazar, Lijo wrote: On 9/29/2022 2:18 PM, Sharma, Shashank wrote: On 9/28/2022 11:51 PM, Alex Deucher wrote: On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank wrote: On 9/27/2022 10:40 PM, Alex Deucher wrote: On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank wrote: On 9/27/2022 5:23 PM, Felix Kuehling wrote: Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. OK, I missed the reference counter. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE I think this won't work. As I understand it, the amdgpu_dpm_switch_power_profile enables and disables individual profiles. Disabling the 3D profile doesn't disable the compute profile at the same time. I think you'll need one refcount per profile. Regards, Felix Thanks, This is exactly what I was looking for, I think Alex's initial idea was around it, but I was under the assumption that there is only one HW profile in SMU which keeps on getting overwritten. This can solve our problems, as I can create an array of reference counters, and will disable only the profile whose reference counter goes 0. It's been a while since I paged any of this code into my head, but I believe the actual workload message in the SMU is a mask where you can specify multiple workload types at the same time and the SMU will arbitrate between them internally. E.g., the most aggressive one will be selected out of the ones specified. I think in the driver we just set one bit at a time using the current interface. It might be better to change the interface and just ref count the hint types and then when we call the set function look at the ref counts for each hint type and set the mask as appropriate. Alex Hey Alex, Thanks for your comment, if that is the case, this current patch series works straight forward, and no changes would be required. Please let me know if my understanding is correct: Assumption: Order of aggression: 3D > Media > Compute - Job 1: Requests mode compute: PP changed to compute, ref count 1 - Job 2: Requests mode media: PP changed to media, ref count 2 - Job 3: requests mode 3D: PP changed to 3D, ref count 3 - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0, PP still 3D - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0, PP still 3D - Job 2 finishes, downs ref count to 0, PP changed to NONE, In this way, every job will be operating in the Power profile of desired aggression or higher, and this API guarantees the execution at-least in the desired power profile. I'm not entirely sure on the relative levels of aggression, but I believe the SMU priorities them by index. E.g. #define WORKLOAD_PPLIB_DEFAULT_BIT 0 #define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1 #define WORKLOAD_PPLIB_POWER_SAVING_BIT 2 #define WORKLOAD_PPLIB_VIDEO_BIT 3 #define WORKLOAD_PPLIB_VR_BIT 4 #define WORKLOAD_PPLIB_COMPUTE_BIT 5 #define WORKLOAD_PPLIB_CUSTOM_BIT 6 3D < video < VR < compute < custom
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/29/2022 2:18 PM, Sharma, Shashank wrote: On 9/28/2022 11:51 PM, Alex Deucher wrote: On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank wrote: On 9/27/2022 10:40 PM, Alex Deucher wrote: On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank wrote: On 9/27/2022 5:23 PM, Felix Kuehling wrote: Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. OK, I missed the reference counter. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE I think this won't work. As I understand it, the amdgpu_dpm_switch_power_profile enables and disables individual profiles. Disabling the 3D profile doesn't disable the compute profile at the same time. I think you'll need one refcount per profile. Regards, Felix Thanks, This is exactly what I was looking for, I think Alex's initial idea was around it, but I was under the assumption that there is only one HW profile in SMU which keeps on getting overwritten. This can solve our problems, as I can create an array of reference counters, and will disable only the profile whose reference counter goes 0. It's been a while since I paged any of this code into my head, but I believe the actual workload message in the SMU is a mask where you can specify multiple workload types at the same time and the SMU will arbitrate between them internally. E.g., the most aggressive one will be selected out of the ones specified. I think in the driver we just set one bit at a time using the current interface. It might be better to change the interface and just ref count the hint types and then when we call the set function look at the ref counts for each hint type and set the mask as appropriate. Alex Hey Alex, Thanks for your comment, if that is the case, this current patch series works straight forward, and no changes would be required. Please let me know if my understanding is correct: Assumption: Order of aggression: 3D > Media > Compute - Job 1: Requests mode compute: PP changed to compute, ref count 1 - Job 2: Requests mode media: PP changed to media, ref count 2 - Job 3: requests mode 3D: PP changed to 3D, ref count 3 - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0, PP still 3D - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0, PP still 3D - Job 2 finishes, downs ref count to 0, PP changed to NONE, In this way, every job will be operating in the Power profile of desired aggression or higher, and this API guarantees the execution at-least in the desired power profile. I'm not entirely sure on the relative levels of aggression, but I believe the SMU priorities them by index. E.g. #define WORKLOAD_PPLIB_DEFAULT_BIT 0 #define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1 #define WORKLOAD_PPLIB_POWER_SAVING_BIT 2 #define WORKLOAD_PPLIB_VIDEO_BIT 3 #define WORKLOAD_PPLIB_VR_BIT 4 #define WORKLOAD_PPLIB_COMPUTE_BIT 5 #define WORKLOAD_PPLIB_CUSTOM_BIT 6 3D < video < VR < compute < custom VR and compute are the most aggressive. Custom takes
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/28/2022 11:51 PM, Alex Deucher wrote: On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank wrote: On 9/27/2022 10:40 PM, Alex Deucher wrote: On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank wrote: On 9/27/2022 5:23 PM, Felix Kuehling wrote: Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. OK, I missed the reference counter. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE I think this won't work. As I understand it, the amdgpu_dpm_switch_power_profile enables and disables individual profiles. Disabling the 3D profile doesn't disable the compute profile at the same time. I think you'll need one refcount per profile. Regards, Felix Thanks, This is exactly what I was looking for, I think Alex's initial idea was around it, but I was under the assumption that there is only one HW profile in SMU which keeps on getting overwritten. This can solve our problems, as I can create an array of reference counters, and will disable only the profile whose reference counter goes 0. It's been a while since I paged any of this code into my head, but I believe the actual workload message in the SMU is a mask where you can specify multiple workload types at the same time and the SMU will arbitrate between them internally. E.g., the most aggressive one will be selected out of the ones specified. I think in the driver we just set one bit at a time using the current interface. It might be better to change the interface and just ref count the hint types and then when we call the set function look at the ref counts for each hint type and set the mask as appropriate. Alex Hey Alex, Thanks for your comment, if that is the case, this current patch series works straight forward, and no changes would be required. Please let me know if my understanding is correct: Assumption: Order of aggression: 3D > Media > Compute - Job 1: Requests mode compute: PP changed to compute, ref count 1 - Job 2: Requests mode media: PP changed to media, ref count 2 - Job 3: requests mode 3D: PP changed to 3D, ref count 3 - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0, PP still 3D - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0, PP still 3D - Job 2 finishes, downs ref count to 0, PP changed to NONE, In this way, every job will be operating in the Power profile of desired aggression or higher, and this API guarantees the execution at-least in the desired power profile. I'm not entirely sure on the relative levels of aggression, but I believe the SMU priorities them by index. E.g. #define WORKLOAD_PPLIB_DEFAULT_BIT0 #define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1 #define WORKLOAD_PPLIB_POWER_SAVING_BIT 2 #define WORKLOAD_PPLIB_VIDEO_BIT 3 #define WORKLOAD_PPLIB_VR_BIT 4 #define WORKLOAD_PPLIB_COMPUTE_BIT5 #define WORKLOAD_PPLIB_CUSTOM_BIT 6 3D < video < VR < compute < custom VR and compute are the most aggressive. Custom takes preference because it's user customizable. Alex Thanks, so this
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On Wed, Sep 28, 2022 at 4:57 AM Sharma, Shashank wrote: > > > > On 9/27/2022 10:40 PM, Alex Deucher wrote: > > On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank > > wrote: > >> > >> > >> > >> On 9/27/2022 5:23 PM, Felix Kuehling wrote: > >>> Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: > Hello Felix, > > Thank for the review comments. > > On 9/27/2022 4:48 PM, Felix Kuehling wrote: > > Am 2022-09-27 um 02:12 schrieb Christian König: > >> Am 26.09.22 um 23:40 schrieb Shashank Sharma: > >>> This patch switches the GPU workload mode to/from > >>> compute mode, while submitting compute workload. > >>> > >>> Signed-off-by: Alex Deucher > >>> Signed-off-by: Shashank Sharma > >> > >> Feel free to add my acked-by, but Felix should probably take a look > >> as well. > > > > This look OK purely from a compute perspective. But I'm concerned > > about the interaction of compute with graphics or multiple graphics > > contexts submitting work concurrently. They would constantly override > > or disable each other's workload hints. > > > > For example, you have an amdgpu_ctx with > > AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD > > process that also wants the compute profile. Those could be different > > processes belonging to different users. Say, KFD enables the compute > > profile first. Then the graphics context submits a job. At the start > > of the job, the compute profile is enabled. That's a no-op because > > KFD already enabled the compute profile. When the job finishes, it > > disables the compute profile for everyone, including KFD. That's > > unexpected. > > > > In this case, it will not disable the compute profile, as the > reference counter will not be zero. The reset_profile() will only act > if the reference counter is 0. > >>> > >>> OK, I missed the reference counter. > >>> > >>> > > But I would be happy to get any inputs about a policy which can be > more sustainable and gets better outputs, for example: > - should we not allow a profile change, if a PP mode is already > applied and keep it Early bird basis ? > > For example: Policy A > - Job A sets the profile to compute > - Job B tries to set profile to 3D, but we do not allow it as job A is > not finished it yet. > > Or Policy B: Current one > - Job A sets the profile to compute > - Job B tries to set profile to 3D, and we allow it. Job A also runs > in PP 3D > - Job B finishes, but does not reset PP as reference count is not zero > due to compute > - Job A finishes, profile reset to NONE > >>> > >>> I think this won't work. As I understand it, the > >>> amdgpu_dpm_switch_power_profile enables and disables individual > >>> profiles. Disabling the 3D profile doesn't disable the compute profile > >>> at the same time. I think you'll need one refcount per profile. > >>> > >>> Regards, > >>> Felix > >> > >> Thanks, This is exactly what I was looking for, I think Alex's initial > >> idea was around it, but I was under the assumption that there is only > >> one HW profile in SMU which keeps on getting overwritten. This can solve > >> our problems, as I can create an array of reference counters, and will > >> disable only the profile whose reference counter goes 0. > > > > It's been a while since I paged any of this code into my head, but I > > believe the actual workload message in the SMU is a mask where you can > > specify multiple workload types at the same time and the SMU will > > arbitrate between them internally. E.g., the most aggressive one will > > be selected out of the ones specified. I think in the driver we just > > set one bit at a time using the current interface. It might be better > > to change the interface and just ref count the hint types and then > > when we call the set function look at the ref counts for each hint > > type and set the mask as appropriate. > > > > Alex > > > > Hey Alex, > Thanks for your comment, if that is the case, this current patch series > works straight forward, and no changes would be required. Please let me > know if my understanding is correct: > > Assumption: Order of aggression: 3D > Media > Compute > > - Job 1: Requests mode compute: PP changed to compute, ref count 1 > - Job 2: Requests mode media: PP changed to media, ref count 2 > - Job 3: requests mode 3D: PP changed to 3D, ref count 3 > - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0, > PP still 3D > - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0, > PP still 3D > - Job 2 finishes, downs ref count to 0, PP changed to NONE, > > In this way, every job will be operating in the Power profile of desired > aggression or higher, and this API guarantees the execution at-least in > the desired power profile. I'm not entirely sure on the
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
Small correction, On 9/28/2022 10:56 AM, Sharma, Shashank wrote: On 9/27/2022 10:40 PM, Alex Deucher wrote: On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank wrote: On 9/27/2022 5:23 PM, Felix Kuehling wrote: Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. OK, I missed the reference counter. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE I think this won't work. As I understand it, the amdgpu_dpm_switch_power_profile enables and disables individual profiles. Disabling the 3D profile doesn't disable the compute profile at the same time. I think you'll need one refcount per profile. Regards, Felix Thanks, This is exactly what I was looking for, I think Alex's initial idea was around it, but I was under the assumption that there is only one HW profile in SMU which keeps on getting overwritten. This can solve our problems, as I can create an array of reference counters, and will disable only the profile whose reference counter goes 0. It's been a while since I paged any of this code into my head, but I believe the actual workload message in the SMU is a mask where you can specify multiple workload types at the same time and the SMU will arbitrate between them internally. E.g., the most aggressive one will be selected out of the ones specified. I think in the driver we just set one bit at a time using the current interface. It might be better to change the interface and just ref count the hint types and then when we call the set function look at the ref counts for each hint type and set the mask as appropriate. Alex Hey Alex, Thanks for your comment, if that is the case, this current patch series works straight forward, and no changes would be required. Only one change required would be to append the new power profile request in the existing power profile mask, instead of overwriting it. This is where the current state machine pm.workload_mode would be useful. - Shashank Please let me know if my understanding is correct: Assumption: Order of aggression: 3D > Media > Compute - Job 1: Requests mode compute: PP changed to compute, ref count 1 - Job 2: Requests mode media: PP changed to media, ref count 2 - Job 3: requests mode 3D: PP changed to 3D, ref count 3 - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0, PP still 3D - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0, PP still 3D - Job 2 finishes, downs ref count to 0, PP changed to NONE, In this way, every job will be operating in the Power profile of desired aggression or higher, and this API guarantees the execution at-least in the desired power profile. - Shashank - Shashank Or anything else ? REgards Shashank Or you have multiple VCN contexts. When context1 finishes a job, it disables the VIDEO profile. But context2 still has a job on the other VCN engine and wants the VIDEO profile to still be enabled. Regards, Felix Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++--- 1
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/27/2022 10:40 PM, Alex Deucher wrote: On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank wrote: On 9/27/2022 5:23 PM, Felix Kuehling wrote: Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. OK, I missed the reference counter. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE I think this won't work. As I understand it, the amdgpu_dpm_switch_power_profile enables and disables individual profiles. Disabling the 3D profile doesn't disable the compute profile at the same time. I think you'll need one refcount per profile. Regards, Felix Thanks, This is exactly what I was looking for, I think Alex's initial idea was around it, but I was under the assumption that there is only one HW profile in SMU which keeps on getting overwritten. This can solve our problems, as I can create an array of reference counters, and will disable only the profile whose reference counter goes 0. It's been a while since I paged any of this code into my head, but I believe the actual workload message in the SMU is a mask where you can specify multiple workload types at the same time and the SMU will arbitrate between them internally. E.g., the most aggressive one will be selected out of the ones specified. I think in the driver we just set one bit at a time using the current interface. It might be better to change the interface and just ref count the hint types and then when we call the set function look at the ref counts for each hint type and set the mask as appropriate. Alex Hey Alex, Thanks for your comment, if that is the case, this current patch series works straight forward, and no changes would be required. Please let me know if my understanding is correct: Assumption: Order of aggression: 3D > Media > Compute - Job 1: Requests mode compute: PP changed to compute, ref count 1 - Job 2: Requests mode media: PP changed to media, ref count 2 - Job 3: requests mode 3D: PP changed to 3D, ref count 3 - Job 1 finishes, downs ref count to 2, doesn't reset the PP as ref > 0, PP still 3D - Job 3 finishes, downs ref count to 1, doesn't reset the PP as ref > 0, PP still 3D - Job 2 finishes, downs ref count to 0, PP changed to NONE, In this way, every job will be operating in the Power profile of desired aggression or higher, and this API guarantees the execution at-least in the desired power profile. - Shashank - Shashank Or anything else ? REgards Shashank Or you have multiple VCN contexts. When context1 finishes a job, it disables the VIDEO profile. But context2 still has a job on the other VCN engine and wants the VIDEO profile to still be enabled. Regards, Felix Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 5e53a5293935..1caed319a448 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -34,6
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/28/2022 2:10 AM, Alex Deucher wrote: On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank wrote: On 9/27/2022 5:23 PM, Felix Kuehling wrote: Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. OK, I missed the reference counter. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE I think this won't work. As I understand it, the amdgpu_dpm_switch_power_profile enables and disables individual profiles. Disabling the 3D profile doesn't disable the compute profile at the same time. I think you'll need one refcount per profile. Regards, Felix Thanks, This is exactly what I was looking for, I think Alex's initial idea was around it, but I was under the assumption that there is only one HW profile in SMU which keeps on getting overwritten. This can solve our problems, as I can create an array of reference counters, and will disable only the profile whose reference counter goes 0. It's been a while since I paged any of this code into my head, but I believe the actual workload message in the SMU is a mask where you can specify multiple workload types at the same time and the SMU will arbitrate between them internally. E.g., the most aggressive one will be selected out of the ones specified. I think in the driver we just set one bit at a time using the current interface. Yes, this is how it works today. Only one profile is set at a time and so setting another one will overwrite the current driver preference. I think the current expectation of usage is from a system settings perspective like Gaming Mode (Full screen 3D) or Cinematic mode (Video) etc. This is also set through sysfs and there is also a Custom mode. It's not used in the sense of a per-job setting. It might be better to change the interface and just ref count the hint types and then when we call the set function look at the ref counts for each hint type and set the mask as appropriate. This means a pm subsytem level change and the ref counts need to be kept in pm layer to account for changes through sysfs or APIs. Thanks, Lijo Alex - Shashank Or anything else ? REgards Shashank Or you have multiple VCN contexts. When context1 finishes a job, it disables the VIDEO profile. But context2 still has a job on the other VCN engine and wants the VIDEO profile to still be enabled. Regards, Felix Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 5e53a5293935..1caed319a448 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -34,6 +34,7 @@ #include "amdgpu_ras.h" #include "amdgpu_umc.h" #include "amdgpu_reset.h" +#include "amdgpu_ctx_workload.h" /* Total memory size in system memory and all GPU VRAM. Used to * estimate worst case amount of memory to reserve for page tables @@ -703,9 +704,16 @@ int
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On Tue, Sep 27, 2022 at 11:38 AM Sharma, Shashank wrote: > > > > On 9/27/2022 5:23 PM, Felix Kuehling wrote: > > Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: > >> Hello Felix, > >> > >> Thank for the review comments. > >> > >> On 9/27/2022 4:48 PM, Felix Kuehling wrote: > >>> Am 2022-09-27 um 02:12 schrieb Christian König: > Am 26.09.22 um 23:40 schrieb Shashank Sharma: > > This patch switches the GPU workload mode to/from > > compute mode, while submitting compute workload. > > > > Signed-off-by: Alex Deucher > > Signed-off-by: Shashank Sharma > > Feel free to add my acked-by, but Felix should probably take a look > as well. > >>> > >>> This look OK purely from a compute perspective. But I'm concerned > >>> about the interaction of compute with graphics or multiple graphics > >>> contexts submitting work concurrently. They would constantly override > >>> or disable each other's workload hints. > >>> > >>> For example, you have an amdgpu_ctx with > >>> AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD > >>> process that also wants the compute profile. Those could be different > >>> processes belonging to different users. Say, KFD enables the compute > >>> profile first. Then the graphics context submits a job. At the start > >>> of the job, the compute profile is enabled. That's a no-op because > >>> KFD already enabled the compute profile. When the job finishes, it > >>> disables the compute profile for everyone, including KFD. That's > >>> unexpected. > >>> > >> > >> In this case, it will not disable the compute profile, as the > >> reference counter will not be zero. The reset_profile() will only act > >> if the reference counter is 0. > > > > OK, I missed the reference counter. > > > > > >> > >> But I would be happy to get any inputs about a policy which can be > >> more sustainable and gets better outputs, for example: > >> - should we not allow a profile change, if a PP mode is already > >> applied and keep it Early bird basis ? > >> > >> For example: Policy A > >> - Job A sets the profile to compute > >> - Job B tries to set profile to 3D, but we do not allow it as job A is > >> not finished it yet. > >> > >> Or Policy B: Current one > >> - Job A sets the profile to compute > >> - Job B tries to set profile to 3D, and we allow it. Job A also runs > >> in PP 3D > >> - Job B finishes, but does not reset PP as reference count is not zero > >> due to compute > >> - Job A finishes, profile reset to NONE > > > > I think this won't work. As I understand it, the > > amdgpu_dpm_switch_power_profile enables and disables individual > > profiles. Disabling the 3D profile doesn't disable the compute profile > > at the same time. I think you'll need one refcount per profile. > > > > Regards, > >Felix > > Thanks, This is exactly what I was looking for, I think Alex's initial > idea was around it, but I was under the assumption that there is only > one HW profile in SMU which keeps on getting overwritten. This can solve > our problems, as I can create an array of reference counters, and will > disable only the profile whose reference counter goes 0. It's been a while since I paged any of this code into my head, but I believe the actual workload message in the SMU is a mask where you can specify multiple workload types at the same time and the SMU will arbitrate between them internally. E.g., the most aggressive one will be selected out of the ones specified. I think in the driver we just set one bit at a time using the current interface. It might be better to change the interface and just ref count the hint types and then when we call the set function look at the ref counts for each hint type and set the mask as appropriate. Alex > > - Shashank > > > > > > >> > >> > >> Or anything else ? > >> > >> REgards > >> Shashank > >> > >> > >>> Or you have multiple VCN contexts. When context1 finishes a job, it > >>> disables the VIDEO profile. But context2 still has a job on the other > >>> VCN engine and wants the VIDEO profile to still be enabled. > >>> > >>> Regards, > >>>Felix > >>> > >>> > > Christian. > > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++--- > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > > index 5e53a5293935..1caed319a448 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c > > @@ -34,6 +34,7 @@ > > #include "amdgpu_ras.h" > > #include "amdgpu_umc.h" > > #include "amdgpu_reset.h" > > +#include "amdgpu_ctx_workload.h" > > /* Total memory size in system memory and all GPU VRAM. Used to > >* estimate worst case amount of memory to reserve for page tables > > @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct > >
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
On 9/27/2022 5:23 PM, Felix Kuehling wrote: Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. OK, I missed the reference counter. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE I think this won't work. As I understand it, the amdgpu_dpm_switch_power_profile enables and disables individual profiles. Disabling the 3D profile doesn't disable the compute profile at the same time. I think you'll need one refcount per profile. Regards, Felix Thanks, This is exactly what I was looking for, I think Alex's initial idea was around it, but I was under the assumption that there is only one HW profile in SMU which keeps on getting overwritten. This can solve our problems, as I can create an array of reference counters, and will disable only the profile whose reference counter goes 0. - Shashank Or anything else ? REgards Shashank Or you have multiple VCN contexts. When context1 finishes a job, it disables the VIDEO profile. But context2 still has a job on the other VCN engine and wants the VIDEO profile to still be enabled. Regards, Felix Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 5e53a5293935..1caed319a448 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -34,6 +34,7 @@ #include "amdgpu_ras.h" #include "amdgpu_umc.h" #include "amdgpu_reset.h" +#include "amdgpu_ctx_workload.h" /* Total memory size in system memory and all GPU VRAM. Used to * estimate worst case amount of memory to reserve for page tables @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle) { - amdgpu_dpm_switch_power_profile(adev, - PP_SMC_POWER_PROFILE_COMPUTE, - !idle); + int ret; + + if (idle) + ret = amdgpu_clear_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + else + ret = amdgpu_set_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + + if (ret) + drm_warn(>ddev, "Failed to %s power profile to compute mode\n", + idle ? "reset" : "set"); } bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
Am 2022-09-27 um 10:58 schrieb Sharma, Shashank: Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. OK, I missed the reference counter. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE I think this won't work. As I understand it, the amdgpu_dpm_switch_power_profile enables and disables individual profiles. Disabling the 3D profile doesn't disable the compute profile at the same time. I think you'll need one refcount per profile. Regards, Felix Or anything else ? REgards Shashank Or you have multiple VCN contexts. When context1 finishes a job, it disables the VIDEO profile. But context2 still has a job on the other VCN engine and wants the VIDEO profile to still be enabled. Regards, Felix Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 5e53a5293935..1caed319a448 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -34,6 +34,7 @@ #include "amdgpu_ras.h" #include "amdgpu_umc.h" #include "amdgpu_reset.h" +#include "amdgpu_ctx_workload.h" /* Total memory size in system memory and all GPU VRAM. Used to * estimate worst case amount of memory to reserve for page tables @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle) { - amdgpu_dpm_switch_power_profile(adev, - PP_SMC_POWER_PROFILE_COMPUTE, - !idle); + int ret; + + if (idle) + ret = amdgpu_clear_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + else + ret = amdgpu_set_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + + if (ret) + drm_warn(>ddev, "Failed to %s power profile to compute mode\n", + idle ? "reset" : "set"); } bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
Hello Felix, Thank for the review comments. On 9/27/2022 4:48 PM, Felix Kuehling wrote: Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. In this case, it will not disable the compute profile, as the reference counter will not be zero. The reset_profile() will only act if the reference counter is 0. But I would be happy to get any inputs about a policy which can be more sustainable and gets better outputs, for example: - should we not allow a profile change, if a PP mode is already applied and keep it Early bird basis ? For example: Policy A - Job A sets the profile to compute - Job B tries to set profile to 3D, but we do not allow it as job A is not finished it yet. Or Policy B: Current one - Job A sets the profile to compute - Job B tries to set profile to 3D, and we allow it. Job A also runs in PP 3D - Job B finishes, but does not reset PP as reference count is not zero due to compute - Job A finishes, profile reset to NONE Or anything else ? REgards Shashank Or you have multiple VCN contexts. When context1 finishes a job, it disables the VIDEO profile. But context2 still has a job on the other VCN engine and wants the VIDEO profile to still be enabled. Regards, Felix Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 5e53a5293935..1caed319a448 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -34,6 +34,7 @@ #include "amdgpu_ras.h" #include "amdgpu_umc.h" #include "amdgpu_reset.h" +#include "amdgpu_ctx_workload.h" /* Total memory size in system memory and all GPU VRAM. Used to * estimate worst case amount of memory to reserve for page tables @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle) { - amdgpu_dpm_switch_power_profile(adev, - PP_SMC_POWER_PROFILE_COMPUTE, - !idle); + int ret; + + if (idle) + ret = amdgpu_clear_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + else + ret = amdgpu_set_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + + if (ret) + drm_warn(>ddev, "Failed to %s power profile to compute mode\n", + idle ? "reset" : "set"); } bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
Am 2022-09-27 um 02:12 schrieb Christian König: Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. This look OK purely from a compute perspective. But I'm concerned about the interaction of compute with graphics or multiple graphics contexts submitting work concurrently. They would constantly override or disable each other's workload hints. For example, you have an amdgpu_ctx with AMDGPU_CTX_WORKLOAD_HINT_COMPUTE (maybe Vulkan compute) and a KFD process that also wants the compute profile. Those could be different processes belonging to different users. Say, KFD enables the compute profile first. Then the graphics context submits a job. At the start of the job, the compute profile is enabled. That's a no-op because KFD already enabled the compute profile. When the job finishes, it disables the compute profile for everyone, including KFD. That's unexpected. Or you have multiple VCN contexts. When context1 finishes a job, it disables the VIDEO profile. But context2 still has a job on the other VCN engine and wants the VIDEO profile to still be enabled. Regards, Felix Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 5e53a5293935..1caed319a448 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -34,6 +34,7 @@ #include "amdgpu_ras.h" #include "amdgpu_umc.h" #include "amdgpu_reset.h" +#include "amdgpu_ctx_workload.h" /* Total memory size in system memory and all GPU VRAM. Used to * estimate worst case amount of memory to reserve for page tables @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle) { - amdgpu_dpm_switch_power_profile(adev, - PP_SMC_POWER_PROFILE_COMPUTE, - !idle); + int ret; + + if (idle) + ret = amdgpu_clear_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + else + ret = amdgpu_set_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + + if (ret) + drm_warn(>ddev, "Failed to %s power profile to compute mode\n", + idle ? "reset" : "set"); } bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
Re: [PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
Am 26.09.22 um 23:40 schrieb Shashank Sharma: This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma Feel free to add my acked-by, but Felix should probably take a look as well. Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 5e53a5293935..1caed319a448 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -34,6 +34,7 @@ #include "amdgpu_ras.h" #include "amdgpu_umc.h" #include "amdgpu_reset.h" +#include "amdgpu_ctx_workload.h" /* Total memory size in system memory and all GPU VRAM. Used to * estimate worst case amount of memory to reserve for page tables @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle) { - amdgpu_dpm_switch_power_profile(adev, - PP_SMC_POWER_PROFILE_COMPUTE, - !idle); + int ret; + + if (idle) + ret = amdgpu_clear_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + else + ret = amdgpu_set_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + + if (ret) + drm_warn(>ddev, "Failed to %s power profile to compute mode\n", +idle ? "reset" : "set"); } bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
[PATCH v3 5/5] drm/amdgpu: switch workload context to/from compute
This patch switches the GPU workload mode to/from compute mode, while submitting compute workload. Signed-off-by: Alex Deucher Signed-off-by: Shashank Sharma --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 5e53a5293935..1caed319a448 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c @@ -34,6 +34,7 @@ #include "amdgpu_ras.h" #include "amdgpu_umc.h" #include "amdgpu_reset.h" +#include "amdgpu_ctx_workload.h" /* Total memory size in system memory and all GPU VRAM. Used to * estimate worst case amount of memory to reserve for page tables @@ -703,9 +704,16 @@ int amdgpu_amdkfd_submit_ib(struct amdgpu_device *adev, void amdgpu_amdkfd_set_compute_idle(struct amdgpu_device *adev, bool idle) { - amdgpu_dpm_switch_power_profile(adev, - PP_SMC_POWER_PROFILE_COMPUTE, - !idle); + int ret; + + if (idle) + ret = amdgpu_clear_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + else + ret = amdgpu_set_workload_profile(adev, AMDGPU_CTX_WORKLOAD_HINT_COMPUTE); + + if (ret) + drm_warn(>ddev, "Failed to %s power profile to compute mode\n", +idle ? "reset" : "set"); } bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid) -- 2.34.1