Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage

2020-08-14 Thread Deucher, Alexander
[AMD Public Use]

No need to apologize, it was a good catch.  Something to double check in the 
future if something similar ends up getting applied again.

Thanks!

Alex

From: amd-gfx  on behalf of Matt Coffin 

Sent: Friday, August 14, 2020 3:13 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Quan, Evan ; Koenig, Christian 
; Li, Dennis 
Subject: Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for 
pp_od_clk_voltage

Hi all,

As of 026acaeac2d205f22c0f682cc1c7b1a85b9ccd00 ("drm/amdgpu: revert "fix
system hang issue during GPU reset""), this patch is no longer needed,
and won't apply, because the badly-behaving code was removed; I'll
withdraw this patch for now.

Sorry to those who wasted their time reviewing.

Cheers,
Matt

On 8/13/20 7:15 PM, Matt Coffin wrote:
> The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an
> issue with the sysfs interface for pp_od_clk_voltage. It overwrites the
> return value to 0 when it calls another function, then returns 0. The
> intended behavior is that a positive return value indicates the number
> of bytes from the buffer that you processed in that call.
>
> With the 0 return value, clients would submit the same value to be
> written over and over again, resulting in an infinite loop.
>
> This is resolved by returning the count of bytes read (in this case the
> whole message), when the desired return is 0 (success).
>
> Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU")
> Bug: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1245data=02%7C01%7Calexander.deucher%40amd.com%7C3fb1fd9f95ad441a88a208d840862e80%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637330292979204288sdata=iABLxfzQVQa5zBK6a0JozvRYl%2Fg5eTFnfOS86r0g9rU%3Dreserved=0
> Signed-off-by: Matt Coffin 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index 1705e328c6fc..f00c7ed361d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -937,7 +937,11 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct 
> device *dev,
>
>  pro_end:
>up_read(>reset_sem);
> - return ret;
> + if (ret) {
> + return ret;
> + } else {
> + return count;
> + }
>  }
>
>  static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev,
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Calexander.deucher%40amd.com%7C3fb1fd9f95ad441a88a208d840862e80%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637330292979204288sdata=GLWlbhgo0grFjcOh5ZAJWhXGxSm6Ufw8eDFDHZf95Uk%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage

2020-08-14 Thread Matt Coffin
Hi all,

As of 026acaeac2d205f22c0f682cc1c7b1a85b9ccd00 ("drm/amdgpu: revert "fix
system hang issue during GPU reset""), this patch is no longer needed,
and won't apply, because the badly-behaving code was removed; I'll
withdraw this patch for now.

Sorry to those who wasted their time reviewing.

Cheers,
Matt

On 8/13/20 7:15 PM, Matt Coffin wrote:
> The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an
> issue with the sysfs interface for pp_od_clk_voltage. It overwrites the
> return value to 0 when it calls another function, then returns 0. The
> intended behavior is that a positive return value indicates the number
> of bytes from the buffer that you processed in that call.
> 
> With the 0 return value, clients would submit the same value to be
> written over and over again, resulting in an infinite loop.
> 
> This is resolved by returning the count of bytes read (in this case the
> whole message), when the desired return is 0 (success).
> 
> Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU")
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1245
> Signed-off-by: Matt Coffin 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index 1705e328c6fc..f00c7ed361d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -937,7 +937,11 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct 
> device *dev,
>  
>  pro_end:
>   up_read(>reset_sem);
> - return ret;
> + if (ret) {
> + return ret;
> + } else {
> + return count;
> + }
>  }
>  
>  static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev,
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage

2020-08-13 Thread Quan, Evan
[AMD Official Use Only - Internal Distribution Only]

Thanks for catching this. The patch is reviewed-by: Evan Quan 

I have the same copy as Matt. @Li, Dennis maybe you were working on an old copy?

BR
Evan
-Original Message-
From: amd-gfx  On Behalf Of Matt Coffin
Sent: Friday, August 14, 2020 11:14 AM
To: Li, Dennis ; amd-gfx@lists.freedesktop.org
Cc: Koenig, Christian 
Subject: Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for 
pp_od_clk_voltage

Hey Dennis,



Thanks for the testing.



I'm having some issues reproducing, as that command is working fine for me in 
sh, bash, and zsh. So just to confirm a few things while I look at it...



1. What kind of SMU is on whatever card you're testing on? Looks like smu_v11+ 
to me?

2. (shouldn't matter if you're right about which -EINVAL return is being hit), 
but is OverDrive enabled?

3. Is this based off of latest amd-staging-drm-next?



This is the code block I'm seeing on the HEAD of alex's branch... which is a 
bit different from what you pasted.



This error also happens **before** the infinite loop I was fixing with this 
patch, but might as well get both birds with one stone if there's still an 
issue.



while (tmp_str[0]) {

sub_str = strsep(_str, delimiter);

ret = kstrtol(sub_str, 0, [parameter_size]);

if (ret)

return -EINVAL;

parameter_size++;



while (isspace(*tmp_str))

tmp_str++;

}

On 8/13/20 8:14 PM, Li, Dennis wrote:
> [AMD Official Use Only - Internal Distribution Only]
>
> Hi, Matt,
>   With your change, I still could reproduce the following issue:
>
> # echo "s 1 1900" > /sys/class/drm/card0/device/pp_od_clk_voltage
> bash: echo: write error: Invalid argument
>
>  I found that it is related the following lines code, could you help 
> double check it?
>
> while ((sub_str = strsep(_str, delimiter)) != NULL) {  // sub_str will be 
> empty string
> ret = kstrtol(sub_str, 0, [parameter_size]);
> if (ret)
> return -EINVAL; // return here
> parameter_size++;
>
> while (isspace(*tmp_str))
> tmp_str++;
> }
>
> Best Regards
> Dennis Li
> -Original Message-
> From: Matt Coffin 
> Sent: Friday, August 14, 2020 9:15 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Koenig, Christian ; Li, Dennis
> ; Matt Coffin 
> Subject: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for
> pp_od_clk_voltage
>
> The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an issue 
> with the sysfs interface for pp_od_clk_voltage. It overwrites the return 
> value to 0 when it calls another function, then returns 0. The intended 
> behavior is that a positive return value indicates the number of bytes from 
> the buffer that you processed in that call.
>
> With the 0 return value, clients would submit the same value to be written 
> over and over again, resulting in an infinite loop.
>
> This is resolved by returning the count of bytes read (in this case the whole 
> message), when the desired return is 0 (success).
>
> Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU")
> Bug:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl
> ab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1245data=02%7C01%7C
> evan.quan%40amd.com%7C08991e3858f144a4dd0908d840001324%7C3dd8961fe4884
> e608e11a82d994e183d%7C0%7C0%7C637329716385273288sdata=w%2FBlX8CpG
> 0TfTYd1InX8Z%2FMBrRbVu%2FT4zWehWKHClCE%3Dreserved=0
> Signed-off-by: Matt Coffin 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index 1705e328c6fc..f00c7ed361d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -937,7 +937,11 @@ static ssize_t
> amdgpu_set_pp_od_clk_voltage(struct device *dev,
>
>  pro_end:
>  up_read(>reset_sem);
> -return ret;
> +if (ret) {
> +return ret;
> +} else {
> +return count;
> +}
>  }
>
>  static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev,
> --
> 2.28.0
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cevan.quan%40amd.com%7C08991e3858f144a4dd0908d840001324%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637329716385273288sdata=IbsPW7%2Fr2HXLxKK4%2FOb6fqFmZXtyivbTsP9ftd7P2Dw%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage

2020-08-13 Thread Matt Coffin
Hey Dennis,



Thanks for the testing.



I'm having some issues reproducing, as that command is working fine for
me in sh, bash, and zsh. So just to confirm a few things while I look at
it...



1. What kind of SMU is on whatever card you're testing on? Looks like
smu_v11+ to me?

2. (shouldn't matter if you're right about which -EINVAL return is being
hit), but is OverDrive enabled?

3. Is this based off of latest amd-staging-drm-next?



This is the code block I'm seeing on the HEAD of alex's branch... which
is a bit different from what you pasted.



This error also happens **before** the infinite loop I was fixing with
this patch, but might as well get both birds with one stone if there's
still an issue.



while (tmp_str[0]) {

sub_str = strsep(_str, delimiter);

ret = kstrtol(sub_str, 0, [parameter_size]);

if (ret)

return -EINVAL;

parameter_size++;



while (isspace(*tmp_str))

tmp_str++;

}

On 8/13/20 8:14 PM, Li, Dennis wrote:
> [AMD Official Use Only - Internal Distribution Only]
> 
> Hi, Matt,
>   With your change, I still could reproduce the following issue:
> 
> # echo "s 1 1900" > /sys/class/drm/card0/device/pp_od_clk_voltage
> bash: echo: write error: Invalid argument
> 
>  I found that it is related the following lines code, could you help 
> double check it?
> 
>   while ((sub_str = strsep(_str, delimiter)) != NULL) {  // sub_str 
> will be empty string
>   ret = kstrtol(sub_str, 0, [parameter_size]);
>   if (ret)
>   return -EINVAL; // return here
>   parameter_size++;
> 
>   while (isspace(*tmp_str))
>   tmp_str++;
>   }
> 
> Best Regards
> Dennis Li
> -Original Message-
> From: Matt Coffin  
> Sent: Friday, August 14, 2020 9:15 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Koenig, Christian ; Li, Dennis 
> ; Matt Coffin 
> Subject: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for 
> pp_od_clk_voltage
> 
> The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an issue 
> with the sysfs interface for pp_od_clk_voltage. It overwrites the return 
> value to 0 when it calls another function, then returns 0. The intended 
> behavior is that a positive return value indicates the number of bytes from 
> the buffer that you processed in that call.
> 
> With the 0 return value, clients would submit the same value to be written 
> over and over again, resulting in an infinite loop.
> 
> This is resolved by returning the count of bytes read (in this case the whole 
> message), when the desired return is 0 (success).
> 
> Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU")
> Bug: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1245data=02%7C01%7CDennis.Li%40amd.com%7C4de8308bf7974ea9e62308d83fef922b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637329646078379799sdata=N9c6e7cUMCDpvBIYUEzxkadJbJdBryXyfhfhb%2BUEwjg%3Dreserved=0
> Signed-off-by: Matt Coffin 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> index 1705e328c6fc..f00c7ed361d4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
> @@ -937,7 +937,11 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct 
> device *dev,
>  
>  pro_end:
>   up_read(>reset_sem);
> - return ret;
> + if (ret) {
> + return ret;
> + } else {
> + return count;
> + }
>  }
>  
>  static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev,
> --
> 2.28.0
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage

2020-08-13 Thread Li, Dennis
[AMD Official Use Only - Internal Distribution Only]

Hi, Matt,
  With your change, I still could reproduce the following issue:

# echo "s 1 1900" > /sys/class/drm/card0/device/pp_od_clk_voltage
bash: echo: write error: Invalid argument

 I found that it is related the following lines code, could you help double 
check it?

while ((sub_str = strsep(_str, delimiter)) != NULL) {  // sub_str 
will be empty string
ret = kstrtol(sub_str, 0, [parameter_size]);
if (ret)
return -EINVAL; // return here
parameter_size++;

while (isspace(*tmp_str))
tmp_str++;
}

Best Regards
Dennis Li
-Original Message-
From: Matt Coffin  
Sent: Friday, August 14, 2020 9:15 AM
To: amd-gfx@lists.freedesktop.org
Cc: Koenig, Christian ; Li, Dennis 
; Matt Coffin 
Subject: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for 
pp_od_clk_voltage

The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an issue 
with the sysfs interface for pp_od_clk_voltage. It overwrites the return value 
to 0 when it calls another function, then returns 0. The intended behavior is 
that a positive return value indicates the number of bytes from the buffer that 
you processed in that call.

With the 0 return value, clients would submit the same value to be written over 
and over again, resulting in an infinite loop.

This is resolved by returning the count of bytes read (in this case the whole 
message), when the desired return is 0 (success).

Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU")
Bug: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1245data=02%7C01%7CDennis.Li%40amd.com%7C4de8308bf7974ea9e62308d83fef922b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637329646078379799sdata=N9c6e7cUMCDpvBIYUEzxkadJbJdBryXyfhfhb%2BUEwjg%3Dreserved=0
Signed-off-by: Matt Coffin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
index 1705e328c6fc..f00c7ed361d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
@@ -937,7 +937,11 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device 
*dev,
 
 pro_end:
up_read(>reset_sem);
-   return ret;
+   if (ret) {
+   return ret;
+   } else {
+   return count;
+   }
 }
 
 static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev,
--
2.28.0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx