Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg

2022-04-12 Thread Powell, Darren
[AMD Official Use Only]

Yes, it looks like I was a little snippy writing that intro, will lighten the 
grammar.
Thanks
Darren

From: Tuikov, Luben 
Sent: Tuesday, April 12, 2022 5:09 PM
To: Powell, Darren ; amd-gfx@lists.freedesktop.org 

Cc: Quan, Evan ; pmen...@molgen.mpg.de 
; Grodzovsky, Andrey 
Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in 
send_smc_mesg

I suppose I didn't quite register this on a first read:

On 2022-04-12 00:08, Darren Powell wrote:
> Contrary to the smu_cmn_send_smc_msg_with_param documentation, two

I'd just say

  Clarify the documentation to also mention that we drop
  messages and return success in the following two cases:
 1. ...

That "Contrary to the ..." sounds a bit harsh.

Regards,
Luben

> cases exist where messages are silently dropped with no error returned
> to the caller. These cases occur in unusual situations where either:
>  1. the message target is a virtual GPU, or
>  2. a PCI recovery is underway and the HW is not yet in sync with the SW
>
> For more details see
>  commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>  commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW 
> state")
>
> (v2)
>   Reworked with suggestions from Luben & Paul
>
> Signed-off-by: Darren Powell 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..8008ae5508e6 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>   * completion of the command, and return back a value from the SMU in
>   * @read_arg pointer.
>   *
> - * Return 0 on success, -errno on error, if we weren't able to send
> - * the message or if the message completed with some kind of
> - * error. See __smu_cmn_reg2errno() for details of the -errno.
> + * Return 0 on success, -errno when a problem is encountered sending
> + * message or receiving reply. If there is a PCI bus recovery or
> + * the destination is a virtual GPU, the message is simply dropped and
> + * success is also returned.
> + * See __smu_cmn_reg2errno() for details of the -errno.
>   *
>   * If we weren't able to send the message to the SMU, we also print
>   * the error to the standard log.
>
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

Regards,
--
Luben


Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg

2022-04-12 Thread Powell, Darren
[AMD Official Use Only]

I needed to dig further down to find the message map, I had been looking back 
in mailing list looking for clarification but hadn't found anything.
Will reword
Thanks
Darren

From: Lazar, Lijo 
Sent: Tuesday, April 12, 2022 12:19 AM
To: Powell, Darren ; amd-gfx@lists.freedesktop.org 

Cc: Tuikov, Luben ; Quan, Evan ; 
Grodzovsky, Andrey ; pmen...@molgen.mpg.de 

Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in 
send_smc_mesg



On 4/12/2022 9:38 AM, Darren Powell wrote:
> Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
> cases exist where messages are silently dropped with no error returned
> to the caller. These cases occur in unusual situations where either:
>   1. the message target is a virtual GPU, or

This is not fully correct - only messages which are not valid for
virtual GPU are dropped, not all.

Thanks,
Lijo

>   2. a PCI recovery is underway and the HW is not yet in sync with the SW
>
> For more details see
>   commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>   commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW 
> state")
>
> (v2)
>Reworked with suggestions from Luben & Paul
>
> Signed-off-by: Darren Powell 
> ---
>   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +---
>   1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..8008ae5508e6 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>* completion of the command, and return back a value from the SMU in
>* @read_arg pointer.
>*
> - * Return 0 on success, -errno on error, if we weren't able to send
> - * the message or if the message completed with some kind of
> - * error. See __smu_cmn_reg2errno() for details of the -errno.
> + * Return 0 on success, -errno when a problem is encountered sending
> + * message or receiving reply. If there is a PCI bus recovery or
> + * the destination is a virtual GPU, the message is simply dropped and
> + * success is also returned.
> + * See __smu_cmn_reg2errno() for details of the -errno.
>*
>* If we weren't able to send the message to the SMU, we also print
>* the error to the standard log.
>
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7
>


Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg

2022-04-12 Thread Luben Tuikov
I suppose I didn't quite register this on a first read:

On 2022-04-12 00:08, Darren Powell wrote:
> Contrary to the smu_cmn_send_smc_msg_with_param documentation, two

I'd just say

  Clarify the documentation to also mention that we drop
  messages and return success in the following two cases:
 1. ...

That "Contrary to the ..." sounds a bit harsh.

Regards,
Luben

> cases exist where messages are silently dropped with no error returned
> to the caller. These cases occur in unusual situations where either:
>  1. the message target is a virtual GPU, or
>  2. a PCI recovery is underway and the HW is not yet in sync with the SW
> 
> For more details see
>  commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>  commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW 
> state")
> 
> (v2)
>   Reworked with suggestions from Luben & Paul
> 
> Signed-off-by: Darren Powell 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..8008ae5508e6 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>   * completion of the command, and return back a value from the SMU in
>   * @read_arg pointer.
>   *
> - * Return 0 on success, -errno on error, if we weren't able to send
> - * the message or if the message completed with some kind of
> - * error. See __smu_cmn_reg2errno() for details of the -errno.
> + * Return 0 on success, -errno when a problem is encountered sending
> + * message or receiving reply. If there is a PCI bus recovery or
> + * the destination is a virtual GPU, the message is simply dropped and
> + * success is also returned.
> + * See __smu_cmn_reg2errno() for details of the -errno.
>   *
>   * If we weren't able to send the message to the SMU, we also print
>   * the error to the standard log.
> 
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

Regards,
-- 
Luben


Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg

2022-04-11 Thread Lazar, Lijo




On 4/12/2022 9:38 AM, Darren Powell wrote:

Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
cases exist where messages are silently dropped with no error returned
to the caller. These cases occur in unusual situations where either:
  1. the message target is a virtual GPU, or


This is not fully correct - only messages which are not valid for 
virtual GPU are dropped, not all.


Thanks,
Lijo


  2. a PCI recovery is underway and the HW is not yet in sync with the SW

For more details see
  commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
  commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW 
state")

(v2)
   Reworked with suggestions from Luben & Paul

Signed-off-by: Darren Powell 
---
  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index b8d0c70ff668..8008ae5508e6 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
   * completion of the command, and return back a value from the SMU in
   * @read_arg pointer.
   *
- * Return 0 on success, -errno on error, if we weren't able to send
- * the message or if the message completed with some kind of
- * error. See __smu_cmn_reg2errno() for details of the -errno.
+ * Return 0 on success, -errno when a problem is encountered sending
+ * message or receiving reply. If there is a PCI bus recovery or
+ * the destination is a virtual GPU, the message is simply dropped and
+ * success is also returned.
+ * See __smu_cmn_reg2errno() for details of the -errno.
   *
   * If we weren't able to send the message to the SMU, we also print
   * the error to the standard log.

base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7



Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg

2022-04-08 Thread Powell, Darren
[AMD Official Use Only]

will respin and incorporate your suggestions
Thanks,
Darren

From: Paul Menzel 
Sent: Friday, April 8, 2022 2:29 AM
To: Powell, Darren 
Cc: amd-gfx@lists.freedesktop.org ; Tuikov, 
Luben ; Quan, Evan ; 
wenhui.sh...@amd.com ; Grodzovsky, Andrey 

Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in 
send_smc_mesg

Dear Darren,


Thank you for your patch.

Am 08.04.22 um 04:26 schrieb Darren Powell:
>   Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
>   cases exist where messages are silently dropped with no error returned
>   to the caller. These cases occur in unusual situations where either:
>1. the caller is a virtual GPU, or
>2. a PCI recovery is underway and the HW is not yet in sync with the SW
>
>   For more details see
>commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW 
> state")

Please remove the indentation. If you re-rolled the patch, you could
also spell *documentation* lowercase in the commit message summary.

> Signed-off-by: Darren Powell 
> ---
>   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +--
>   1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..b1bd1990c88b 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>* completion of the command, and return back a value from the SMU in
>* @read_arg pointer.
>*
> - * Return 0 on success, -errno on error, if we weren't able to send
> + * Return 0 on success, or if the message is dropped.
> + * On error, -errno is returned if we weren't able to send
>* the message or if the message completed with some kind of
>* error. See __smu_cmn_reg2errno() for details of the -errno.
>*
>* If we weren't able to send the message to the SMU, we also print
> - * the error to the standard log.
> + * the error to the standard log. Dropped messages can be caused
> + * due to PCI slot recovery or attempting to send from a virtual GPU,
> + * and do not print an error.
>*
>* Command completion status is printed only if the -errno is
>* -EREMOTEIO, indicating that the SMU returned back an
>
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

The diff looks good – despite Mozilla Thunderbird quoting it strangely.


Kind regards,

Paul


Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg

2022-04-08 Thread Powell, Darren
[AMD Official Use Only]

inline


From: Tuikov, Luben 
Sent: Friday, April 8, 2022 9:33 AM
To: Powell, Darren ; amd-gfx@lists.freedesktop.org 

Cc: Quan, Evan ; wenhui.sh...@amd.com 
; Grodzovsky, Andrey 
Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in 
send_smc_mesg

I'd add who and how is the message dropped, and also mention that we're unable
to recognize a dropped message.

On 2022-04-07 22:26, Darren Powell wrote:
>  Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
>  cases exist where messages are silently dropped with no error returned
>  to the caller. These cases occur in unusual situations where either:
>   1. the caller is a virtual GPU, or

The caller? Isn't this code executed on a CPU sending to the SMU (which lives 
on a GPU)?
[DP] Great point, will fix

>   2. a PCI recovery is underway and the HW is not yet in sync with the SW
>
>  For more details see
>   commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>   commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW 
> state")
>
> Signed-off-by: Darren Powell 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..b1bd1990c88b 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>   * completion of the command, and return back a value from the SMU in
>   * @read_arg pointer.
>   *
> - * Return 0 on success, -errno on error, if we weren't able to send
> + * Return 0 on success, or if the message is dropped.
> + * On error, -errno is returned if we weren't able to send

Something like this:

  Return 0 on success, -errno on error. If the message was dropped
  due to PCI bus recovery or sending to a virtual GPU, we're unable
  to detect this and success is also returned.

>   * the message or if the message completed with some kind of
>   * error. See __smu_cmn_reg2errno() for details of the -errno.
>   *
>   * If we weren't able to send the message to the SMU, we also print
> - * the error to the standard log.
> + * the error to the standard log. Dropped messages can be caused
> + * due to PCI slot recovery or attempting to send from a virtual GPU,
> + * and do not print an error.

This is a moot point with the clarification I suggested above and I'd remove 
that.
[DP] sounds more succinct, will address in v2

>   *
>   * Command completion status is printed only if the -errno is
>   * -EREMOTEIO, indicating that the SMU returned back an
>
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

Regards,
--
Luben


Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg

2022-04-08 Thread Luben Tuikov
I'd add who and how is the message dropped, and also mention that we're unable
to recognize a dropped message.

On 2022-04-07 22:26, Darren Powell wrote:
>  Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
>  cases exist where messages are silently dropped with no error returned
>  to the caller. These cases occur in unusual situations where either:
>   1. the caller is a virtual GPU, or

The caller? Isn't this code executed on a CPU sending to the SMU (which lives 
on a GPU)?

>   2. a PCI recovery is underway and the HW is not yet in sync with the SW
> 
>  For more details see
>   commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>   commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW 
> state")
> 
> Signed-off-by: Darren Powell 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..b1bd1990c88b 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>   * completion of the command, and return back a value from the SMU in
>   * @read_arg pointer.
>   *
> - * Return 0 on success, -errno on error, if we weren't able to send
> + * Return 0 on success, or if the message is dropped.
> + * On error, -errno is returned if we weren't able to send

Something like this:

  Return 0 on success, -errno on error. If the message was dropped
  due to PCI bus recovery or sending to a virtual GPU, we're unable
  to detect this and success is also returned.

>   * the message or if the message completed with some kind of
>   * error. See __smu_cmn_reg2errno() for details of the -errno.
>   *
>   * If we weren't able to send the message to the SMU, we also print
> - * the error to the standard log.
> + * the error to the standard log. Dropped messages can be caused
> + * due to PCI slot recovery or attempting to send from a virtual GPU,
> + * and do not print an error.

This is a moot point with the clarification I suggested above and I'd remove 
that.

>   *
>   * Command completion status is printed only if the -errno is
>   * -EREMOTEIO, indicating that the SMU returned back an
> 
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

Regards,
-- 
Luben


Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg

2022-04-08 Thread Paul Menzel

Dear Darren,


Thank you for your patch.

Am 08.04.22 um 04:26 schrieb Darren Powell:

  Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
  cases exist where messages are silently dropped with no error returned
  to the caller. These cases occur in unusual situations where either:
   1. the caller is a virtual GPU, or
   2. a PCI recovery is underway and the HW is not yet in sync with the SW

  For more details see
   commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
   commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW 
state")


Please remove the indentation. If you re-rolled the patch, you could 
also spell *documentation* lowercase in the commit message summary.



Signed-off-by: Darren Powell 
---
  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index b8d0c70ff668..b1bd1990c88b 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
   * completion of the command, and return back a value from the SMU in
   * @read_arg pointer.
   *
- * Return 0 on success, -errno on error, if we weren't able to send
+ * Return 0 on success, or if the message is dropped.
+ * On error, -errno is returned if we weren't able to send
   * the message or if the message completed with some kind of
   * error. See __smu_cmn_reg2errno() for details of the -errno.
   *
   * If we weren't able to send the message to the SMU, we also print
- * the error to the standard log.
+ * the error to the standard log. Dropped messages can be caused
+ * due to PCI slot recovery or attempting to send from a virtual GPU,
+ * and do not print an error.
   *
   * Command completion status is printed only if the -errno is
   * -EREMOTEIO, indicating that the SMU returned back an

base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7


The diff looks good – despite Mozilla Thunderbird quoting it strangely.


Kind regards,

Paul