Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
[AMD Official Use Only] Yes, it looks like I was a little snippy writing that intro, will lighten the grammar. Thanks Darren From: Tuikov, Luben Sent: Tuesday, April 12, 2022 5:09 PM To: Powell, Darren ; amd-gfx@lists.freedesktop.org Cc: Quan, Evan ; pmen...@molgen.mpg.de ; Grodzovsky, Andrey Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg I suppose I didn't quite register this on a first read: On 2022-04-12 00:08, Darren Powell wrote: > Contrary to the smu_cmn_send_smc_msg_with_param documentation, two I'd just say Clarify the documentation to also mention that we drop messages and return success in the following two cases: 1. ... That "Contrary to the ..." sounds a bit harsh. Regards, Luben > cases exist where messages are silently dropped with no error returned > to the caller. These cases occur in unusual situations where either: > 1. the message target is a virtual GPU, or > 2. a PCI recovery is underway and the HW is not yet in sync with the SW > > For more details see > commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter") > commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW > state") > > (v2) > Reworked with suggestions from Luben & Paul > > Signed-off-by: Darren Powell > --- > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > index b8d0c70ff668..8008ae5508e6 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu) > * completion of the command, and return back a value from the SMU in > * @read_arg pointer. > * > - * Return 0 on success, -errno on error, if we weren't able to send > - * the message or if the message completed with some kind of > - * error. See __smu_cmn_reg2errno() for details of the -errno. > + * Return 0 on success, -errno when a problem is encountered sending > + * message or receiving reply. If there is a PCI bus recovery or > + * the destination is a virtual GPU, the message is simply dropped and > + * success is also returned. > + * See __smu_cmn_reg2errno() for details of the -errno. > * > * If we weren't able to send the message to the SMU, we also print > * the error to the standard log. > > base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7 Regards, -- Luben
Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
[AMD Official Use Only] I needed to dig further down to find the message map, I had been looking back in mailing list looking for clarification but hadn't found anything. Will reword Thanks Darren From: Lazar, Lijo Sent: Tuesday, April 12, 2022 12:19 AM To: Powell, Darren ; amd-gfx@lists.freedesktop.org Cc: Tuikov, Luben ; Quan, Evan ; Grodzovsky, Andrey ; pmen...@molgen.mpg.de Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg On 4/12/2022 9:38 AM, Darren Powell wrote: > Contrary to the smu_cmn_send_smc_msg_with_param documentation, two > cases exist where messages are silently dropped with no error returned > to the caller. These cases occur in unusual situations where either: > 1. the message target is a virtual GPU, or This is not fully correct - only messages which are not valid for virtual GPU are dropped, not all. Thanks, Lijo > 2. a PCI recovery is underway and the HW is not yet in sync with the SW > > For more details see > commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter") > commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW > state") > > (v2) >Reworked with suggestions from Luben & Paul > > Signed-off-by: Darren Powell > --- > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > index b8d0c70ff668..8008ae5508e6 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu) >* completion of the command, and return back a value from the SMU in >* @read_arg pointer. >* > - * Return 0 on success, -errno on error, if we weren't able to send > - * the message or if the message completed with some kind of > - * error. See __smu_cmn_reg2errno() for details of the -errno. > + * Return 0 on success, -errno when a problem is encountered sending > + * message or receiving reply. If there is a PCI bus recovery or > + * the destination is a virtual GPU, the message is simply dropped and > + * success is also returned. > + * See __smu_cmn_reg2errno() for details of the -errno. >* >* If we weren't able to send the message to the SMU, we also print >* the error to the standard log. > > base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7 >
Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
I suppose I didn't quite register this on a first read: On 2022-04-12 00:08, Darren Powell wrote: > Contrary to the smu_cmn_send_smc_msg_with_param documentation, two I'd just say Clarify the documentation to also mention that we drop messages and return success in the following two cases: 1. ... That "Contrary to the ..." sounds a bit harsh. Regards, Luben > cases exist where messages are silently dropped with no error returned > to the caller. These cases occur in unusual situations where either: > 1. the message target is a virtual GPU, or > 2. a PCI recovery is underway and the HW is not yet in sync with the SW > > For more details see > commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter") > commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW > state") > > (v2) > Reworked with suggestions from Luben & Paul > > Signed-off-by: Darren Powell > --- > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > index b8d0c70ff668..8008ae5508e6 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu) > * completion of the command, and return back a value from the SMU in > * @read_arg pointer. > * > - * Return 0 on success, -errno on error, if we weren't able to send > - * the message or if the message completed with some kind of > - * error. See __smu_cmn_reg2errno() for details of the -errno. > + * Return 0 on success, -errno when a problem is encountered sending > + * message or receiving reply. If there is a PCI bus recovery or > + * the destination is a virtual GPU, the message is simply dropped and > + * success is also returned. > + * See __smu_cmn_reg2errno() for details of the -errno. > * > * If we weren't able to send the message to the SMU, we also print > * the error to the standard log. > > base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7 Regards, -- Luben
Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
On 4/12/2022 9:38 AM, Darren Powell wrote: Contrary to the smu_cmn_send_smc_msg_with_param documentation, two cases exist where messages are silently dropped with no error returned to the caller. These cases occur in unusual situations where either: 1. the message target is a virtual GPU, or This is not fully correct - only messages which are not valid for virtual GPU are dropped, not all. Thanks, Lijo 2. a PCI recovery is underway and the HW is not yet in sync with the SW For more details see commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter") commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state") (v2) Reworked with suggestions from Luben & Paul Signed-off-by: Darren Powell --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c index b8d0c70ff668..8008ae5508e6 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu) * completion of the command, and return back a value from the SMU in * @read_arg pointer. * - * Return 0 on success, -errno on error, if we weren't able to send - * the message or if the message completed with some kind of - * error. See __smu_cmn_reg2errno() for details of the -errno. + * Return 0 on success, -errno when a problem is encountered sending + * message or receiving reply. If there is a PCI bus recovery or + * the destination is a virtual GPU, the message is simply dropped and + * success is also returned. + * See __smu_cmn_reg2errno() for details of the -errno. * * If we weren't able to send the message to the SMU, we also print * the error to the standard log. base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7
Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg
[AMD Official Use Only] will respin and incorporate your suggestions Thanks, Darren From: Paul Menzel Sent: Friday, April 8, 2022 2:29 AM To: Powell, Darren Cc: amd-gfx@lists.freedesktop.org ; Tuikov, Luben ; Quan, Evan ; wenhui.sh...@amd.com ; Grodzovsky, Andrey Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg Dear Darren, Thank you for your patch. Am 08.04.22 um 04:26 schrieb Darren Powell: > Contrary to the smu_cmn_send_smc_msg_with_param documentation, two > cases exist where messages are silently dropped with no error returned > to the caller. These cases occur in unusual situations where either: >1. the caller is a virtual GPU, or >2. a PCI recovery is underway and the HW is not yet in sync with the SW > > For more details see >commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter") >commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW > state") Please remove the indentation. If you re-rolled the patch, you could also spell *documentation* lowercase in the commit message summary. > Signed-off-by: Darren Powell > --- > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > index b8d0c70ff668..b1bd1990c88b 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > @@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu) >* completion of the command, and return back a value from the SMU in >* @read_arg pointer. >* > - * Return 0 on success, -errno on error, if we weren't able to send > + * Return 0 on success, or if the message is dropped. > + * On error, -errno is returned if we weren't able to send >* the message or if the message completed with some kind of >* error. See __smu_cmn_reg2errno() for details of the -errno. >* >* If we weren't able to send the message to the SMU, we also print > - * the error to the standard log. > + * the error to the standard log. Dropped messages can be caused > + * due to PCI slot recovery or attempting to send from a virtual GPU, > + * and do not print an error. >* >* Command completion status is printed only if the -errno is >* -EREMOTEIO, indicating that the SMU returned back an > > base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7 The diff looks good – despite Mozilla Thunderbird quoting it strangely. Kind regards, Paul
Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg
[AMD Official Use Only] inline From: Tuikov, Luben Sent: Friday, April 8, 2022 9:33 AM To: Powell, Darren ; amd-gfx@lists.freedesktop.org Cc: Quan, Evan ; wenhui.sh...@amd.com ; Grodzovsky, Andrey Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg I'd add who and how is the message dropped, and also mention that we're unable to recognize a dropped message. On 2022-04-07 22:26, Darren Powell wrote: > Contrary to the smu_cmn_send_smc_msg_with_param documentation, two > cases exist where messages are silently dropped with no error returned > to the caller. These cases occur in unusual situations where either: > 1. the caller is a virtual GPU, or The caller? Isn't this code executed on a CPU sending to the SMU (which lives on a GPU)? [DP] Great point, will fix > 2. a PCI recovery is underway and the HW is not yet in sync with the SW > > For more details see > commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter") > commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW > state") > > Signed-off-by: Darren Powell > --- > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > index b8d0c70ff668..b1bd1990c88b 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > @@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu) > * completion of the command, and return back a value from the SMU in > * @read_arg pointer. > * > - * Return 0 on success, -errno on error, if we weren't able to send > + * Return 0 on success, or if the message is dropped. > + * On error, -errno is returned if we weren't able to send Something like this: Return 0 on success, -errno on error. If the message was dropped due to PCI bus recovery or sending to a virtual GPU, we're unable to detect this and success is also returned. > * the message or if the message completed with some kind of > * error. See __smu_cmn_reg2errno() for details of the -errno. > * > * If we weren't able to send the message to the SMU, we also print > - * the error to the standard log. > + * the error to the standard log. Dropped messages can be caused > + * due to PCI slot recovery or attempting to send from a virtual GPU, > + * and do not print an error. This is a moot point with the clarification I suggested above and I'd remove that. [DP] sounds more succinct, will address in v2 > * > * Command completion status is printed only if the -errno is > * -EREMOTEIO, indicating that the SMU returned back an > > base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7 Regards, -- Luben
Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg
I'd add who and how is the message dropped, and also mention that we're unable to recognize a dropped message. On 2022-04-07 22:26, Darren Powell wrote: > Contrary to the smu_cmn_send_smc_msg_with_param documentation, two > cases exist where messages are silently dropped with no error returned > to the caller. These cases occur in unusual situations where either: > 1. the caller is a virtual GPU, or The caller? Isn't this code executed on a CPU sending to the SMU (which lives on a GPU)? > 2. a PCI recovery is underway and the HW is not yet in sync with the SW > > For more details see > commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter") > commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW > state") > > Signed-off-by: Darren Powell > --- > drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > index b8d0c70ff668..b1bd1990c88b 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c > @@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu) > * completion of the command, and return back a value from the SMU in > * @read_arg pointer. > * > - * Return 0 on success, -errno on error, if we weren't able to send > + * Return 0 on success, or if the message is dropped. > + * On error, -errno is returned if we weren't able to send Something like this: Return 0 on success, -errno on error. If the message was dropped due to PCI bus recovery or sending to a virtual GPU, we're unable to detect this and success is also returned. > * the message or if the message completed with some kind of > * error. See __smu_cmn_reg2errno() for details of the -errno. > * > * If we weren't able to send the message to the SMU, we also print > - * the error to the standard log. > + * the error to the standard log. Dropped messages can be caused > + * due to PCI slot recovery or attempting to send from a virtual GPU, > + * and do not print an error. This is a moot point with the clarification I suggested above and I'd remove that. > * > * Command completion status is printed only if the -errno is > * -EREMOTEIO, indicating that the SMU returned back an > > base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7 Regards, -- Luben
Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg
Dear Darren, Thank you for your patch. Am 08.04.22 um 04:26 schrieb Darren Powell: Contrary to the smu_cmn_send_smc_msg_with_param documentation, two cases exist where messages are silently dropped with no error returned to the caller. These cases occur in unusual situations where either: 1. the caller is a virtual GPU, or 2. a PCI recovery is underway and the HW is not yet in sync with the SW For more details see commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter") commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state") Please remove the indentation. If you re-rolled the patch, you could also spell *documentation* lowercase in the commit message summary. Signed-off-by: Darren Powell --- drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c index b8d0c70ff668..b1bd1990c88b 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c @@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu) * completion of the command, and return back a value from the SMU in * @read_arg pointer. * - * Return 0 on success, -errno on error, if we weren't able to send + * Return 0 on success, or if the message is dropped. + * On error, -errno is returned if we weren't able to send * the message or if the message completed with some kind of * error. See __smu_cmn_reg2errno() for details of the -errno. * * If we weren't able to send the message to the SMU, we also print - * the error to the standard log. + * the error to the standard log. Dropped messages can be caused + * due to PCI slot recovery or attempting to send from a virtual GPU, + * and do not print an error. * * Command completion status is printed only if the -errno is * -EREMOTEIO, indicating that the SMU returned back an base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7 The diff looks good – despite Mozilla Thunderbird quoting it strangely. Kind regards, Paul