Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage
[AMD Public Use] No need to apologize, it was a good catch. Something to double check in the future if something similar ends up getting applied again. Thanks! Alex From: amd-gfx on behalf of Matt Coffin Sent: Friday, August 14, 2020 3:13 PM To: amd-gfx@lists.freedesktop.org Cc: Quan, Evan ; Koenig, Christian ; Li, Dennis Subject: Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage Hi all, As of 026acaeac2d205f22c0f682cc1c7b1a85b9ccd00 ("drm/amdgpu: revert "fix system hang issue during GPU reset""), this patch is no longer needed, and won't apply, because the badly-behaving code was removed; I'll withdraw this patch for now. Sorry to those who wasted their time reviewing. Cheers, Matt On 8/13/20 7:15 PM, Matt Coffin wrote: > The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an > issue with the sysfs interface for pp_od_clk_voltage. It overwrites the > return value to 0 when it calls another function, then returns 0. The > intended behavior is that a positive return value indicates the number > of bytes from the buffer that you processed in that call. > > With the 0 return value, clients would submit the same value to be > written over and over again, resulting in an infinite loop. > > This is resolved by returning the count of bytes read (in this case the > whole message), when the desired return is 0 (success). > > Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU") > Bug: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1245data=02%7C01%7Calexander.deucher%40amd.com%7C3fb1fd9f95ad441a88a208d840862e80%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637330292979204288sdata=iABLxfzQVQa5zBK6a0JozvRYl%2Fg5eTFnfOS86r0g9rU%3Dreserved=0 > Signed-off-by: Matt Coffin > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > index 1705e328c6fc..f00c7ed361d4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > @@ -937,7 +937,11 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct > device *dev, > > pro_end: >up_read(>reset_sem); > - return ret; > + if (ret) { > + return ret; > + } else { > + return count; > + } > } > > static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev, > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Calexander.deucher%40amd.com%7C3fb1fd9f95ad441a88a208d840862e80%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637330292979204288sdata=GLWlbhgo0grFjcOh5ZAJWhXGxSm6Ufw8eDFDHZf95Uk%3Dreserved=0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage
Hi all, As of 026acaeac2d205f22c0f682cc1c7b1a85b9ccd00 ("drm/amdgpu: revert "fix system hang issue during GPU reset""), this patch is no longer needed, and won't apply, because the badly-behaving code was removed; I'll withdraw this patch for now. Sorry to those who wasted their time reviewing. Cheers, Matt On 8/13/20 7:15 PM, Matt Coffin wrote: > The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an > issue with the sysfs interface for pp_od_clk_voltage. It overwrites the > return value to 0 when it calls another function, then returns 0. The > intended behavior is that a positive return value indicates the number > of bytes from the buffer that you processed in that call. > > With the 0 return value, clients would submit the same value to be > written over and over again, resulting in an infinite loop. > > This is resolved by returning the count of bytes read (in this case the > whole message), when the desired return is 0 (success). > > Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU") > Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1245 > Signed-off-by: Matt Coffin > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > index 1705e328c6fc..f00c7ed361d4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > @@ -937,7 +937,11 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct > device *dev, > > pro_end: > up_read(>reset_sem); > - return ret; > + if (ret) { > + return ret; > + } else { > + return count; > + } > } > > static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev, > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage
[AMD Official Use Only - Internal Distribution Only] Thanks for catching this. The patch is reviewed-by: Evan Quan I have the same copy as Matt. @Li, Dennis maybe you were working on an old copy? BR Evan -Original Message- From: amd-gfx On Behalf Of Matt Coffin Sent: Friday, August 14, 2020 11:14 AM To: Li, Dennis ; amd-gfx@lists.freedesktop.org Cc: Koenig, Christian Subject: Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage Hey Dennis, Thanks for the testing. I'm having some issues reproducing, as that command is working fine for me in sh, bash, and zsh. So just to confirm a few things while I look at it... 1. What kind of SMU is on whatever card you're testing on? Looks like smu_v11+ to me? 2. (shouldn't matter if you're right about which -EINVAL return is being hit), but is OverDrive enabled? 3. Is this based off of latest amd-staging-drm-next? This is the code block I'm seeing on the HEAD of alex's branch... which is a bit different from what you pasted. This error also happens **before** the infinite loop I was fixing with this patch, but might as well get both birds with one stone if there's still an issue. while (tmp_str[0]) { sub_str = strsep(_str, delimiter); ret = kstrtol(sub_str, 0, [parameter_size]); if (ret) return -EINVAL; parameter_size++; while (isspace(*tmp_str)) tmp_str++; } On 8/13/20 8:14 PM, Li, Dennis wrote: > [AMD Official Use Only - Internal Distribution Only] > > Hi, Matt, > With your change, I still could reproduce the following issue: > > # echo "s 1 1900" > /sys/class/drm/card0/device/pp_od_clk_voltage > bash: echo: write error: Invalid argument > > I found that it is related the following lines code, could you help > double check it? > > while ((sub_str = strsep(_str, delimiter)) != NULL) { // sub_str will be > empty string > ret = kstrtol(sub_str, 0, [parameter_size]); > if (ret) > return -EINVAL; // return here > parameter_size++; > > while (isspace(*tmp_str)) > tmp_str++; > } > > Best Regards > Dennis Li > -Original Message- > From: Matt Coffin > Sent: Friday, August 14, 2020 9:15 AM > To: amd-gfx@lists.freedesktop.org > Cc: Koenig, Christian ; Li, Dennis > ; Matt Coffin > Subject: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for > pp_od_clk_voltage > > The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an issue > with the sysfs interface for pp_od_clk_voltage. It overwrites the return > value to 0 when it calls another function, then returns 0. The intended > behavior is that a positive return value indicates the number of bytes from > the buffer that you processed in that call. > > With the 0 return value, clients would submit the same value to be written > over and over again, resulting in an infinite loop. > > This is resolved by returning the count of bytes read (in this case the whole > message), when the desired return is 0 (success). > > Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU") > Bug: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitl > ab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1245data=02%7C01%7C > evan.quan%40amd.com%7C08991e3858f144a4dd0908d840001324%7C3dd8961fe4884 > e608e11a82d994e183d%7C0%7C0%7C637329716385273288sdata=w%2FBlX8CpG > 0TfTYd1InX8Z%2FMBrRbVu%2FT4zWehWKHClCE%3Dreserved=0 > Signed-off-by: Matt Coffin > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > index 1705e328c6fc..f00c7ed361d4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > @@ -937,7 +937,11 @@ static ssize_t > amdgpu_set_pp_od_clk_voltage(struct device *dev, > > pro_end: > up_read(>reset_sem); > -return ret; > +if (ret) { > +return ret; > +} else { > +return count; > +} > } > > static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev, > -- > 2.28.0 > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cevan.quan%40amd.com%7C08991e3858f144a4dd0908d840001324%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637329716385273288sdata=IbsPW7%2Fr2HXLxKK4%2FOb6fqFmZXtyivbTsP9ftd7P2Dw%3Dreserved=0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage
Hey Dennis, Thanks for the testing. I'm having some issues reproducing, as that command is working fine for me in sh, bash, and zsh. So just to confirm a few things while I look at it... 1. What kind of SMU is on whatever card you're testing on? Looks like smu_v11+ to me? 2. (shouldn't matter if you're right about which -EINVAL return is being hit), but is OverDrive enabled? 3. Is this based off of latest amd-staging-drm-next? This is the code block I'm seeing on the HEAD of alex's branch... which is a bit different from what you pasted. This error also happens **before** the infinite loop I was fixing with this patch, but might as well get both birds with one stone if there's still an issue. while (tmp_str[0]) { sub_str = strsep(_str, delimiter); ret = kstrtol(sub_str, 0, [parameter_size]); if (ret) return -EINVAL; parameter_size++; while (isspace(*tmp_str)) tmp_str++; } On 8/13/20 8:14 PM, Li, Dennis wrote: > [AMD Official Use Only - Internal Distribution Only] > > Hi, Matt, > With your change, I still could reproduce the following issue: > > # echo "s 1 1900" > /sys/class/drm/card0/device/pp_od_clk_voltage > bash: echo: write error: Invalid argument > > I found that it is related the following lines code, could you help > double check it? > > while ((sub_str = strsep(_str, delimiter)) != NULL) { // sub_str > will be empty string > ret = kstrtol(sub_str, 0, [parameter_size]); > if (ret) > return -EINVAL; // return here > parameter_size++; > > while (isspace(*tmp_str)) > tmp_str++; > } > > Best Regards > Dennis Li > -Original Message- > From: Matt Coffin > Sent: Friday, August 14, 2020 9:15 AM > To: amd-gfx@lists.freedesktop.org > Cc: Koenig, Christian ; Li, Dennis > ; Matt Coffin > Subject: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for > pp_od_clk_voltage > > The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an issue > with the sysfs interface for pp_od_clk_voltage. It overwrites the return > value to 0 when it calls another function, then returns 0. The intended > behavior is that a positive return value indicates the number of bytes from > the buffer that you processed in that call. > > With the 0 return value, clients would submit the same value to be written > over and over again, resulting in an infinite loop. > > This is resolved by returning the count of bytes read (in this case the whole > message), when the desired return is 0 (success). > > Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU") > Bug: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1245data=02%7C01%7CDennis.Li%40amd.com%7C4de8308bf7974ea9e62308d83fef922b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637329646078379799sdata=N9c6e7cUMCDpvBIYUEzxkadJbJdBryXyfhfhb%2BUEwjg%3Dreserved=0 > Signed-off-by: Matt Coffin > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > index 1705e328c6fc..f00c7ed361d4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c > @@ -937,7 +937,11 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct > device *dev, > > pro_end: > up_read(>reset_sem); > - return ret; > + if (ret) { > + return ret; > + } else { > + return count; > + } > } > > static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev, > -- > 2.28.0 > ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage
[AMD Official Use Only - Internal Distribution Only] Hi, Matt, With your change, I still could reproduce the following issue: # echo "s 1 1900" > /sys/class/drm/card0/device/pp_od_clk_voltage bash: echo: write error: Invalid argument I found that it is related the following lines code, could you help double check it? while ((sub_str = strsep(_str, delimiter)) != NULL) { // sub_str will be empty string ret = kstrtol(sub_str, 0, [parameter_size]); if (ret) return -EINVAL; // return here parameter_size++; while (isspace(*tmp_str)) tmp_str++; } Best Regards Dennis Li -Original Message- From: Matt Coffin Sent: Friday, August 14, 2020 9:15 AM To: amd-gfx@lists.freedesktop.org Cc: Koenig, Christian ; Li, Dennis ; Matt Coffin Subject: [PATCH] drm/amdgpu: Fix incorrect return value in sysfs for pp_od_clk_voltage The changes in edad8312cbbf9a33c86873fc4093664f150dd5c1 introduced an issue with the sysfs interface for pp_od_clk_voltage. It overwrites the return value to 0 when it calls another function, then returns 0. The intended behavior is that a positive return value indicates the number of bytes from the buffer that you processed in that call. With the 0 return value, clients would submit the same value to be written over and over again, resulting in an infinite loop. This is resolved by returning the count of bytes read (in this case the whole message), when the desired return is 0 (success). Fixes: edad8312cbbf ("drm/amdgpu: fix system hang issue during GPU") Bug: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1245data=02%7C01%7CDennis.Li%40amd.com%7C4de8308bf7974ea9e62308d83fef922b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637329646078379799sdata=N9c6e7cUMCDpvBIYUEzxkadJbJdBryXyfhfhb%2BUEwjg%3Dreserved=0 Signed-off-by: Matt Coffin --- drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c index 1705e328c6fc..f00c7ed361d4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c @@ -937,7 +937,11 @@ static ssize_t amdgpu_set_pp_od_clk_voltage(struct device *dev, pro_end: up_read(>reset_sem); - return ret; + if (ret) { + return ret; + } else { + return count; + } } static ssize_t amdgpu_get_pp_od_clk_voltage(struct device *dev, -- 2.28.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx