RE: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Koenig, Christian
Yeah, that should work as well.

The extra text is a bit questionable, but I think we can live with that as well.

Christian.

Am 08.08.2019 17:48 schrieb "Zhang, Hawking" :
Not exactly, there will be only one line left
feature mask: 0x3ffb

Regards,
Hawking
-Original Message-
From: Freehill, Chris 
Sent: 2019年8月8日 23:28
To: Koenig, Christian ; Zhang, Hawking 
; Russell, Kent ; Zhou1, Tao 
; amd-gfx@lists.freedesktop.org; Pan, Xinhui 

Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info

Hi Hawking,

I don't want to divert from this important discussion, but would like 
clarification...

In the rocm-smi library, I previously had this interpretation of the values (I 
think someone had confirmed this):
none,   //!< No current errors
disabled   //!< ECC is disabled
parity //!< ECC errors present, but type unknown
 single_correctable, //!< Single correctable error
multi_correctable//!< Multiple uncorrectable errors
poison //!< Firmware detected error and isolated
 //!< page. Treat as uncorrectable.

It seems now we are just saying it's enabled/disabled, but no indication of 
whether there are errors or not. Is that right, or am I misunderstanding 
something?

Chris

-Original Message-
From: Koenig, Christian 
Sent: Thursday, August 8, 2019 10:24 AM
To: Zhang, Hawking ; Russell, Kent 
; Zhou1, Tao ; 
amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Freehill, 
Chris 
Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info

Hi Hawking,

yeah, but this isn't sufficient to comply to the upstream rules.

See what we need to do is to remove the extra text and the per IP information. 
In other words something like this:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 1a4412e47810..c8c7f9655ffc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -819,22 +819,7 @@ static ssize_t
amdgpu_ras_sysfs_features_read(struct device *dev,
 ssize_t s;
 struct ras_manager *obj;

-   s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n",
con->features);
-
-   for (i = 0; i < ras_block_count; i++) {
-   head.block = i;
-
-   if (amdgpu_ras_is_feature_enabled(adev, &head)) {
-   obj = amdgpu_ras_find_obj(adev, &head);
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: %s\n",
-   ras_block_str(i),
- ras_err_str(obj->head.type));
-   } else
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: disabled\n",
-   ras_block_str(i));
-   }
+   s = scnprintf(buf, PAGE_SIZE, "0x%x\n", con->features);

 return s;
  }

And I'm not talking about some rule we could bend. This is an absolutely
*MUST* have for upstreaming.

Regards,
Christian.

Am 08.08.19 um 17:17 schrieb Zhang, Hawking:
> Hi Chris,
>
> I thought I've stated as "The feature mask is already good enough for this 
> node". It is just a uint32 value. We don't expect ROCm SMI to read any other 
> information from feature node except for the feature mask.
>
> Regards,
> Hawking
>
> -Original Message-
> From: Koenig, Christian 
> Sent: 2019年8月8日 23:11
> To: Zhang, Hawking ; Russell, Kent
> ; Zhou1, Tao ;
> amd-gfx@lists.freedesktop.org; Pan, Xinhui ;
> Freehill, Chris 
> Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> Hi Hawking,
>
> a multi line value is not the problem, but here you have multiple values.
>
> E.g. in the case of the pp tables that is one big array of power profiles and 
> we actually had a discussion if exposing them like this is ok or not.
>
> But in the case of the ras features you got multiple different things in the 
> same file. And that in turn is a clear violation of the sysfs rules.
>
> I don't think we can't upstream the interface like this. See here for
> a good summary of the rules as well: https://lwn.net/Articles/378884/
>
> Regards,
> Christian.
>
> Am 08.08.19 um 17:00 schrieb Zhang, Hawking:
>> Hi Chris,
>>
>> I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are 
>> intended to be used by upper level apps/libs.
>>
>> There are bunches of sysfs entries that have multiple line value. The most 
>> complicate one would be pp_power_profile_mode, which looks like.
>>
>> 0 BOOTUP_DEFAULT*:
>>   0(   GFXCLK)   0   0   1   0   
>> 4 800 4587520  -65536   0
>> 

RE: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Zhang, Hawking
Not exactly, there will be only one line left
feature mask: 0x3ffb

Regards,
Hawking
-Original Message-
From: Freehill, Chris  
Sent: 2019年8月8日 23:28
To: Koenig, Christian ; Zhang, Hawking 
; Russell, Kent ; Zhou1, Tao 
; amd-gfx@lists.freedesktop.org; Pan, Xinhui 

Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info

Hi Hawking,

I don't want to divert from this important discussion, but would like 
clarification...

In the rocm-smi library, I previously had this interpretation of the values (I 
think someone had confirmed this):
none,   //!< No current errors
disabled   //!< ECC is disabled
parity //!< ECC errors present, but type unknown
 single_correctable, //!< Single correctable error
multi_correctable//!< Multiple uncorrectable errors
poison //!< Firmware detected error and isolated
 //!< page. Treat as uncorrectable.

It seems now we are just saying it's enabled/disabled, but no indication of 
whether there are errors or not. Is that right, or am I misunderstanding 
something?

Chris

-Original Message-
From: Koenig, Christian 
Sent: Thursday, August 8, 2019 10:24 AM
To: Zhang, Hawking ; Russell, Kent 
; Zhou1, Tao ; 
amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Freehill, 
Chris 
Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info

Hi Hawking,

yeah, but this isn't sufficient to comply to the upstream rules.

See what we need to do is to remove the extra text and the per IP information. 
In other words something like this:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 1a4412e47810..c8c7f9655ffc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -819,22 +819,7 @@ static ssize_t
amdgpu_ras_sysfs_features_read(struct device *dev,
     ssize_t s;
     struct ras_manager *obj;

-   s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", 
con->features);
-
-   for (i = 0; i < ras_block_count; i++) {
-   head.block = i;
-
-   if (amdgpu_ras_is_feature_enabled(adev, &head)) {
-   obj = amdgpu_ras_find_obj(adev, &head);
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: %s\n",
-   ras_block_str(i),
- ras_err_str(obj->head.type));
-   } else
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: disabled\n",
-   ras_block_str(i));
-   }
+   s = scnprintf(buf, PAGE_SIZE, "0x%x\n", con->features);

     return s;
  }

And I'm not talking about some rule we could bend. This is an absolutely
*MUST* have for upstreaming.

Regards,
Christian.

Am 08.08.19 um 17:17 schrieb Zhang, Hawking:
> Hi Chris,
>
> I thought I've stated as "The feature mask is already good enough for this 
> node". It is just a uint32 value. We don't expect ROCm SMI to read any other 
> information from feature node except for the feature mask.
>
> Regards,
> Hawking
>
> -Original Message-
> From: Koenig, Christian 
> Sent: 2019年8月8日 23:11
> To: Zhang, Hawking ; Russell, Kent 
> ; Zhou1, Tao ; 
> amd-gfx@lists.freedesktop.org; Pan, Xinhui ; 
> Freehill, Chris 
> Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> Hi Hawking,
>
> a multi line value is not the problem, but here you have multiple values.
>
> E.g. in the case of the pp tables that is one big array of power profiles and 
> we actually had a discussion if exposing them like this is ok or not.
>
> But in the case of the ras features you got multiple different things in the 
> same file. And that in turn is a clear violation of the sysfs rules.
>
> I don't think we can't upstream the interface like this. See here for 
> a good summary of the rules as well: https://lwn.net/Articles/378884/
>
> Regards,
> Christian.
>
> Am 08.08.19 um 17:00 schrieb Zhang, Hawking:
>> Hi Chris,
>>
>> I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are 
>> intended to be used by upper level apps/libs.
>>
>> There are bunches of sysfs entries that have multiple line value. The most 
>> complicate one would be pp_power_profile_mode, which looks like.
>>
>> 0 BOOTUP_DEFAULT*:
>>   0(   GFXCLK)   0   0   1   0   
>> 4 800 4587520  -65536   0
>>   1(   SOCCLK)   0   0   1   0   
>> 4 800  327680   -6553   0
>>   2( UCLK)   0   0   1  

RE: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Freehill, Chris
Hi Hawking,

I don't want to divert from this important discussion, but would like 
clarification...

In the rocm-smi library, I previously had this interpretation of the values (I 
think someone had confirmed this):
none,   //!< No current errors
disabled   //!< ECC is disabled
parity //!< ECC errors present, but type unknown
 single_correctable, //!< Single correctable error
multi_correctable//!< Multiple uncorrectable errors
poison //!< Firmware detected error and isolated
 //!< page. Treat as uncorrectable.

It seems now we are just saying it's enabled/disabled, but no indication of 
whether there are errors or not. Is that right, or am I misunderstanding 
something?

Chris

-Original Message-
From: Koenig, Christian  
Sent: Thursday, August 8, 2019 10:24 AM
To: Zhang, Hawking ; Russell, Kent 
; Zhou1, Tao ; 
amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Freehill, 
Chris 
Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info

Hi Hawking,

yeah, but this isn't sufficient to comply to the upstream rules.

See what we need to do is to remove the extra text and the per IP information. 
In other words something like this:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 1a4412e47810..c8c7f9655ffc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -819,22 +819,7 @@ static ssize_t
amdgpu_ras_sysfs_features_read(struct device *dev,
     ssize_t s;
     struct ras_manager *obj;

-   s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", 
con->features);
-
-   for (i = 0; i < ras_block_count; i++) {
-   head.block = i;
-
-   if (amdgpu_ras_is_feature_enabled(adev, &head)) {
-   obj = amdgpu_ras_find_obj(adev, &head);
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: %s\n",
-   ras_block_str(i),
- ras_err_str(obj->head.type));
-   } else
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: disabled\n",
-   ras_block_str(i));
-   }
+   s = scnprintf(buf, PAGE_SIZE, "0x%x\n", con->features);

     return s;
  }

And I'm not talking about some rule we could bend. This is an absolutely
*MUST* have for upstreaming.

Regards,
Christian.

Am 08.08.19 um 17:17 schrieb Zhang, Hawking:
> Hi Chris,
>
> I thought I've stated as "The feature mask is already good enough for this 
> node". It is just a uint32 value. We don't expect ROCm SMI to read any other 
> information from feature node except for the feature mask.
>
> Regards,
> Hawking
>
> -Original Message-
> From: Koenig, Christian 
> Sent: 2019年8月8日 23:11
> To: Zhang, Hawking ; Russell, Kent 
> ; Zhou1, Tao ; 
> amd-gfx@lists.freedesktop.org; Pan, Xinhui ; 
> Freehill, Chris 
> Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> Hi Hawking,
>
> a multi line value is not the problem, but here you have multiple values.
>
> E.g. in the case of the pp tables that is one big array of power profiles and 
> we actually had a discussion if exposing them like this is ok or not.
>
> But in the case of the ras features you got multiple different things in the 
> same file. And that in turn is a clear violation of the sysfs rules.
>
> I don't think we can't upstream the interface like this. See here for 
> a good summary of the rules as well: https://lwn.net/Articles/378884/
>
> Regards,
> Christian.
>
> Am 08.08.19 um 17:00 schrieb Zhang, Hawking:
>> Hi Chris,
>>
>> I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are 
>> intended to be used by upper level apps/libs.
>>
>> There are bunches of sysfs entries that have multiple line value. The most 
>> complicate one would be pp_power_profile_mode, which looks like.
>>
>> 0 BOOTUP_DEFAULT*:
>>   0(   GFXCLK)   0   0   1   0   
>> 4 800 4587520  -65536   0
>>   1(   SOCCLK)   0   0   1   0   
>> 4 800  327680   -6553   0
>>   2( UCLK)   0   0   1   0   
>> 4 800  327680  -65536   0
>> ...
>> 1 3D_FULL_SCREEN :
>>   0(   GFXCLK)   0   1   1   0   
>> 4 800 4587520  -65536   0
>>               1(   SOCCLK)   0   1   4 850   
>> 4 800  327680

Re: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Koenig, Christian
Hi Hawking,

yeah, but this isn't sufficient to comply to the upstream rules.

See what we need to do is to remove the extra text and the per IP 
information. In other words something like this:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 1a4412e47810..c8c7f9655ffc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -819,22 +819,7 @@ static ssize_t 
amdgpu_ras_sysfs_features_read(struct device *dev,
     ssize_t s;
     struct ras_manager *obj;

-   s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", 
con->features);
-
-   for (i = 0; i < ras_block_count; i++) {
-   head.block = i;
-
-   if (amdgpu_ras_is_feature_enabled(adev, &head)) {
-   obj = amdgpu_ras_find_obj(adev, &head);
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: %s\n",
-   ras_block_str(i),
- ras_err_str(obj->head.type));
-   } else
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: disabled\n",
-   ras_block_str(i));
-   }
+   s = scnprintf(buf, PAGE_SIZE, "0x%x\n", con->features);

     return s;
  }

And I'm not talking about some rule we could bend. This is an absolutely 
*MUST* have for upstreaming.

Regards,
Christian.

Am 08.08.19 um 17:17 schrieb Zhang, Hawking:
> Hi Chris,
>
> I thought I've stated as "The feature mask is already good enough for this 
> node". It is just a uint32 value. We don't expect ROCm SMI to read any other 
> information from feature node except for the feature mask.
>
> Regards,
> Hawking
>
> -Original Message-
> From: Koenig, Christian 
> Sent: 2019年8月8日 23:11
> To: Zhang, Hawking ; Russell, Kent 
> ; Zhou1, Tao ; 
> amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Freehill, 
> Chris 
> Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> Hi Hawking,
>
> a multi line value is not the problem, but here you have multiple values.
>
> E.g. in the case of the pp tables that is one big array of power profiles and 
> we actually had a discussion if exposing them like this is ok or not.
>
> But in the case of the ras features you got multiple different things in the 
> same file. And that in turn is a clear violation of the sysfs rules.
>
> I don't think we can't upstream the interface like this. See here for a good 
> summary of the rules as well: https://lwn.net/Articles/378884/
>
> Regards,
> Christian.
>
> Am 08.08.19 um 17:00 schrieb Zhang, Hawking:
>> Hi Chris,
>>
>> I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are 
>> intended to be used by upper level apps/libs.
>>
>> There are bunches of sysfs entries that have multiple line value. The most 
>> complicate one would be pp_power_profile_mode, which looks like.
>>
>> 0 BOOTUP_DEFAULT*:
>>   0(   GFXCLK)   0   0   1   0   
>> 4 800 4587520  -65536   0
>>   1(   SOCCLK)   0   0   1   0   
>> 4 800  327680   -6553   0
>>   2( UCLK)   0   0   1   0   
>> 4 800  327680  -65536   0
>> ...
>> 1 3D_FULL_SCREEN :
>>   0(   GFXCLK)   0   1   1   0   
>> 4 800 4587520  -65536   0
>>   1(   SOCCLK)   0   1   4 850   
>> 4 800  327680  -65536   0
>>
>> Regards,
>> Hawking
>> -Original Message-
>> From: Christian König 
>> Sent: 2019年8月8日 22:25
>> To: Zhang, Hawking ; Russell, Kent
>> ; Zhou1, Tao ;
>> amd-gfx@lists.freedesktop.org; Pan, Xinhui ;
>> Freehill, Chris 
>> Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info
>>
>> Hi Hawking,
>>
>> looks like you skipped my response.
>>
>> Even the current way how sysfs is used in the ras code is a clear NO-GO and 
>> should be fixed before this is pushed upstream.
>>
>> A sysfs entry should seriously NOT return a multi line value which needs to 
>> be extensively parsed by the application.
>>
>> Regards,
>> Christian.
>>
>> Am 08.08.19 um 15:50 schrieb Zhang, Hawking:
>>> Understood and agree we should keep stable interfaces.
>>>
>>> But the information in feature node is not correct and makes people

RE: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Zhang, Hawking
Hi Chris,

I thought I've stated as "The feature mask is already good enough for this 
node". It is just a uint32 value. We don't expect ROCm SMI to read any other 
information from feature node except for the feature mask. 

Regards,
Hawking

-Original Message-
From: Koenig, Christian  
Sent: 2019年8月8日 23:11
To: Zhang, Hawking ; Russell, Kent 
; Zhou1, Tao ; 
amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Freehill, 
Chris 
Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info

Hi Hawking,

a multi line value is not the problem, but here you have multiple values.

E.g. in the case of the pp tables that is one big array of power profiles and 
we actually had a discussion if exposing them like this is ok or not.

But in the case of the ras features you got multiple different things in the 
same file. And that in turn is a clear violation of the sysfs rules.

I don't think we can't upstream the interface like this. See here for a good 
summary of the rules as well: https://lwn.net/Articles/378884/

Regards,
Christian.

Am 08.08.19 um 17:00 schrieb Zhang, Hawking:
> Hi Chris,
>
> I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are 
> intended to be used by upper level apps/libs.
>
> There are bunches of sysfs entries that have multiple line value. The most 
> complicate one would be pp_power_profile_mode, which looks like.
>
> 0 BOOTUP_DEFAULT*:
>  0(   GFXCLK)   0   0   1   0   4 
> 800 4587520  -65536   0
>  1(   SOCCLK)   0   0   1   0   4 
> 800  327680   -6553   0
>  2( UCLK)   0   0   1   0   4 
> 800  327680  -65536   0
> ...
> 1 3D_FULL_SCREEN :
>  0(   GFXCLK)   0   1   1   0   4 
> 800 4587520  -65536   0
>  1(   SOCCLK)   0   1   4 850   4 
> 800  327680  -65536   0
>
> Regards,
> Hawking
> -Original Message-
> From: Christian König 
> Sent: 2019年8月8日 22:25
> To: Zhang, Hawking ; Russell, Kent 
> ; Zhou1, Tao ; 
> amd-gfx@lists.freedesktop.org; Pan, Xinhui ; 
> Freehill, Chris 
> Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> Hi Hawking,
>
> looks like you skipped my response.
>
> Even the current way how sysfs is used in the ras code is a clear NO-GO and 
> should be fixed before this is pushed upstream.
>
> A sysfs entry should seriously NOT return a multi line value which needs to 
> be extensively parsed by the application.
>
> Regards,
> Christian.
>
> Am 08.08.19 um 15:50 schrieb Zhang, Hawking:
>> Understood and agree we should keep stable interfaces.
>>
>> But the information in feature node is not correct and makes people 
>> confusing. Basically, each IP blocks can support all the four error types, 
>> not just multi-uncorrectable. As a result, any upper level apps/libs that 
>> read from this file will just get confusing information as well. The feature 
>> mask is already good enough for this node.
>>
>> Regards,
>> Hawking
>> -----Original Message-
>> From: Russell, Kent 
>> Sent: 2019年8月8日 20:51
>> To: Zhang, Hawking ; Zhou1, Tao 
>> ; amd-gfx@lists.freedesktop.org; Pan, Xinhui 
>> ; Freehill, Chris 
>> Cc: Zhou1, Tao 
>> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>>
>> +Chris Freehill
>>
>> While I can understand this change, this broke our SMI interface, which was 
>> expecting a specific string format for the ras/features file. This has 
>> happened a few times now, where changes to the RAS sysfs files has broke the 
>> SMI CLI and/or SMI LIB. Can we please get a stable interface and sysfs 
>> format set up before publishing patches? This is creating a lot of extra 
>> work for developers with the SMI to constantly keep up with the changes 
>> being made to sysfs files. Thank you.
>>
>>Kent
>>
>> -Original Message-
>> From: amd-gfx  On Behalf Of 
>> Zhang, Hawking
>> Sent: Monday, August 5, 2019 4:15 AM
>> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; 
>> Pan, Xinhui 
>> Cc: Zhou1, Tao 
>> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>>
>> Reviewed-by: Hawking Zhang 
>>
>> Regards,
>> Hawking
>> -Original Message-
>> From: amd-gfx  On Behalf Of 
>> Tao Zhou
>> Sent: 2019年8月5日 16:04
>> To: amd-gfx@lists.freedesktop.org; Pan, Xinhui ; 
>> Zhang, Hawking 
>> Cc: Zhou1, Tao 
>> Subject: [PATCH] drm/amdgpu: upda

Re: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Koenig, Christian
Hi Hawking,

a multi line value is not the problem, but here you have multiple values.

E.g. in the case of the pp tables that is one big array of power 
profiles and we actually had a discussion if exposing them like this is 
ok or not.

But in the case of the ras features you got multiple different things in 
the same file. And that in turn is a clear violation of the sysfs rules.

I don't think we can't upstream the interface like this. See here for a 
good summary of the rules as well: https://lwn.net/Articles/378884/

Regards,
Christian.

Am 08.08.19 um 17:00 schrieb Zhang, Hawking:
> Hi Chris,
>
> I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are 
> intended to be used by upper level apps/libs.
>
> There are bunches of sysfs entries that have multiple line value. The most 
> complicate one would be pp_power_profile_mode, which looks like.
>
> 0 BOOTUP_DEFAULT*:
>  0(   GFXCLK)   0   0   1   0   4 
> 800 4587520  -65536   0
>  1(   SOCCLK)   0   0   1   0   4 
> 800  327680   -6553   0
>  2( UCLK)   0   0   1   0   4 
> 800  327680  -65536   0
> ...
> 1 3D_FULL_SCREEN :
>  0(   GFXCLK)   0   1   1   0   4 
> 800 4587520  -65536   0
>  1(   SOCCLK)   0   1   4 850   4 
> 800  327680  -65536   0
>
> Regards,
> Hawking
> -Original Message-
> From: Christian König 
> Sent: 2019年8月8日 22:25
> To: Zhang, Hawking ; Russell, Kent 
> ; Zhou1, Tao ; 
> amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Freehill, 
> Chris 
> Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> Hi Hawking,
>
> looks like you skipped my response.
>
> Even the current way how sysfs is used in the ras code is a clear NO-GO and 
> should be fixed before this is pushed upstream.
>
> A sysfs entry should seriously NOT return a multi line value which needs to 
> be extensively parsed by the application.
>
> Regards,
> Christian.
>
> Am 08.08.19 um 15:50 schrieb Zhang, Hawking:
>> Understood and agree we should keep stable interfaces.
>>
>> But the information in feature node is not correct and makes people 
>> confusing. Basically, each IP blocks can support all the four error types, 
>> not just multi-uncorrectable. As a result, any upper level apps/libs that 
>> read from this file will just get confusing information as well. The feature 
>> mask is already good enough for this node.
>>
>> Regards,
>> Hawking
>> -----Original Message-----
>> From: Russell, Kent 
>> Sent: 2019年8月8日 20:51
>> To: Zhang, Hawking ; Zhou1, Tao
>> ; amd-gfx@lists.freedesktop.org; Pan, Xinhui
>> ; Freehill, Chris 
>> Cc: Zhou1, Tao 
>> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>>
>> +Chris Freehill
>>
>> While I can understand this change, this broke our SMI interface, which was 
>> expecting a specific string format for the ras/features file. This has 
>> happened a few times now, where changes to the RAS sysfs files has broke the 
>> SMI CLI and/or SMI LIB. Can we please get a stable interface and sysfs 
>> format set up before publishing patches? This is creating a lot of extra 
>> work for developers with the SMI to constantly keep up with the changes 
>> being made to sysfs files. Thank you.
>>
>>Kent
>>
>> -Original Message-
>> From: amd-gfx  On Behalf Of
>> Zhang, Hawking
>> Sent: Monday, August 5, 2019 4:15 AM
>> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org;
>> Pan, Xinhui 
>> Cc: Zhou1, Tao 
>> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>>
>> Reviewed-by: Hawking Zhang 
>>
>> Regards,
>> Hawking
>> -Original Message-
>> From: amd-gfx  On Behalf Of Tao
>> Zhou
>> Sent: 2019年8月5日 16:04
>> To: amd-gfx@lists.freedesktop.org; Pan, Xinhui ;
>> Zhang, Hawking 
>> Cc: Zhou1, Tao 
>> Subject: [PATCH] drm/amdgpu: update ras sysfs feature info
>>
>> remove confused ras error type info
>>
>> Signed-off-by: Tao Zhou 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +
>>1 file changed, 5 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> index d2e8a85f6e38..369651247b23 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> +++ 

RE: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Zhang, Hawking
Hi Chris,

I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are 
intended to be used by upper level apps/libs. 

There are bunches of sysfs entries that have multiple line value. The most 
complicate one would be pp_power_profile_mode, which looks like. 

0 BOOTUP_DEFAULT*:
0(   GFXCLK)   0   0   1   0   4
 800 4587520  -65536   0
1(   SOCCLK)   0   0   1   0   4
 800  327680   -6553   0
2( UCLK)   0   0   1   0   4
 800  327680  -65536   0
...
1 3D_FULL_SCREEN :
0(   GFXCLK)   0   1   1   0   4
 800 4587520  -65536   0
1(   SOCCLK)   0   1   4 850   4
 800  327680  -65536   0

Regards,
Hawking
-Original Message-
From: Christian König  
Sent: 2019年8月8日 22:25
To: Zhang, Hawking ; Russell, Kent 
; Zhou1, Tao ; 
amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Freehill, 
Chris 
Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info

Hi Hawking,

looks like you skipped my response.

Even the current way how sysfs is used in the ras code is a clear NO-GO and 
should be fixed before this is pushed upstream.

A sysfs entry should seriously NOT return a multi line value which needs to be 
extensively parsed by the application.

Regards,
Christian.

Am 08.08.19 um 15:50 schrieb Zhang, Hawking:
> Understood and agree we should keep stable interfaces.
>
> But the information in feature node is not correct and makes people 
> confusing. Basically, each IP blocks can support all the four error types, 
> not just multi-uncorrectable. As a result, any upper level apps/libs that 
> read from this file will just get confusing information as well. The feature 
> mask is already good enough for this node.
>
> Regards,
> Hawking
> -Original Message-
> From: Russell, Kent 
> Sent: 2019年8月8日 20:51
> To: Zhang, Hawking ; Zhou1, Tao 
> ; amd-gfx@lists.freedesktop.org; Pan, Xinhui 
> ; Freehill, Chris 
> Cc: Zhou1, Tao 
> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> +Chris Freehill
>
> While I can understand this change, this broke our SMI interface, which was 
> expecting a specific string format for the ras/features file. This has 
> happened a few times now, where changes to the RAS sysfs files has broke the 
> SMI CLI and/or SMI LIB. Can we please get a stable interface and sysfs format 
> set up before publishing patches? This is creating a lot of extra work for 
> developers with the SMI to constantly keep up with the changes being made to 
> sysfs files. Thank you.
>
>   Kent
>
> -Original Message-
> From: amd-gfx  On Behalf Of 
> Zhang, Hawking
> Sent: Monday, August 5, 2019 4:15 AM
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; 
> Pan, Xinhui 
> Cc: Zhou1, Tao 
> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> Reviewed-by: Hawking Zhang 
>
> Regards,
> Hawking
> -Original Message-
> From: amd-gfx  On Behalf Of Tao 
> Zhou
> Sent: 2019年8月5日 16:04
> To: amd-gfx@lists.freedesktop.org; Pan, Xinhui ; 
> Zhang, Hawking 
> Cc: Zhou1, Tao 
> Subject: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> remove confused ras error type info
>
> Signed-off-by: Tao Zhou 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +
>   1 file changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index d2e8a85f6e38..369651247b23 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -787,25 +787,18 @@ static ssize_t amdgpu_ras_sysfs_features_read(struct 
> device *dev,
>   struct amdgpu_device *adev = ddev->dev_private;
>   struct ras_common_if head;
>   int ras_block_count = AMDGPU_RAS_BLOCK_COUNT;
> - int i;
> + int i, enabled;
>   ssize_t s;
> - struct ras_manager *obj;
>   
>   s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", 
> con->features);
>   
>   for (i = 0; i < ras_block_count; i++) {
>   head.block = i;
> + enabled = amdgpu_ras_is_feature_enabled(adev, &head);
>   
> - if (amdgpu_ras_is_feature_enabled(adev, &head)) {
> - obj = amdgpu_ras_find_obj(adev, &head);
> - s += scnprintf(&buf[s], PAGE_SIZE - s,
> - "%s: %s\n",
> - ras_block_str(i),
> - ras_err_str(obj->head.type)

Re: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Christian König

Hi Hawking,

looks like you skipped my response.

Even the current way how sysfs is used in the ras code is a clear NO-GO 
and should be fixed before this is pushed upstream.


A sysfs entry should seriously NOT return a multi line value which needs 
to be extensively parsed by the application.


Regards,
Christian.

Am 08.08.19 um 15:50 schrieb Zhang, Hawking:

Understood and agree we should keep stable interfaces.

But the information in feature node is not correct and makes people confusing. 
Basically, each IP blocks can support all the four error types, not just 
multi-uncorrectable. As a result, any upper level apps/libs that read from this 
file will just get confusing information as well. The feature mask is already 
good enough for this node.

Regards,
Hawking
-Original Message-
From: Russell, Kent 
Sent: 2019年8月8日 20:51
To: Zhang, Hawking ; Zhou1, Tao ; 
amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Freehill, Chris 

Cc: Zhou1, Tao 
Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info

+Chris Freehill

While I can understand this change, this broke our SMI interface, which was 
expecting a specific string format for the ras/features file. This has happened 
a few times now, where changes to the RAS sysfs files has broke the SMI CLI 
and/or SMI LIB. Can we please get a stable interface and sysfs format set up 
before publishing patches? This is creating a lot of extra work for developers 
with the SMI to constantly keep up with the changes being made to sysfs files. 
Thank you.

  Kent

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Monday, August 5, 2019 4:15 AM
To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; Pan, Xinhui 

Cc: Zhou1, Tao 
Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Tao Zhou
Sent: 2019年8月5日 16:04
To: amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Zhang, Hawking 

Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: update ras sysfs feature info

remove confused ras error type info

Signed-off-by: Tao Zhou 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +
  1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d2e8a85f6e38..369651247b23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -787,25 +787,18 @@ static ssize_t amdgpu_ras_sysfs_features_read(struct 
device *dev,
struct amdgpu_device *adev = ddev->dev_private;
struct ras_common_if head;
int ras_block_count = AMDGPU_RAS_BLOCK_COUNT;
-   int i;
+   int i, enabled;
ssize_t s;
-   struct ras_manager *obj;
  
  	s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", con->features);
  
  	for (i = 0; i < ras_block_count; i++) {

head.block = i;
+   enabled = amdgpu_ras_is_feature_enabled(adev, &head);
  
-		if (amdgpu_ras_is_feature_enabled(adev, &head)) {

-   obj = amdgpu_ras_find_obj(adev, &head);
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: %s\n",
-   ras_block_str(i),
-   ras_err_str(obj->head.type));
-   } else
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: disabled\n",
-   ras_block_str(i));
+   s += scnprintf(&buf[s], PAGE_SIZE - s,
+   "%s ras feature mask: %s\n",
+   ras_block_str(i), enabled?"on":"off");
}
  
  	return s;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Zhang, Hawking
Understood and agree we should keep stable interfaces. 

But the information in feature node is not correct and makes people confusing. 
Basically, each IP blocks can support all the four error types, not just 
multi-uncorrectable. As a result, any upper level apps/libs that read from this 
file will just get confusing information as well. The feature mask is already 
good enough for this node.

Regards,
Hawking
-Original Message-
From: Russell, Kent  
Sent: 2019年8月8日 20:51
To: Zhang, Hawking ; Zhou1, Tao ; 
amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Freehill, 
Chris 
Cc: Zhou1, Tao 
Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info

+Chris Freehill

While I can understand this change, this broke our SMI interface, which was 
expecting a specific string format for the ras/features file. This has happened 
a few times now, where changes to the RAS sysfs files has broke the SMI CLI 
and/or SMI LIB. Can we please get a stable interface and sysfs format set up 
before publishing patches? This is creating a lot of extra work for developers 
with the SMI to constantly keep up with the changes being made to sysfs files. 
Thank you.

 Kent

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Monday, August 5, 2019 4:15 AM
To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; Pan, Xinhui 

Cc: Zhou1, Tao 
Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Tao Zhou
Sent: 2019年8月5日 16:04
To: amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Zhang, 
Hawking 
Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: update ras sysfs feature info

remove confused ras error type info

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d2e8a85f6e38..369651247b23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -787,25 +787,18 @@ static ssize_t amdgpu_ras_sysfs_features_read(struct 
device *dev,
struct amdgpu_device *adev = ddev->dev_private;
struct ras_common_if head;
int ras_block_count = AMDGPU_RAS_BLOCK_COUNT;
-   int i;
+   int i, enabled;
ssize_t s;
-   struct ras_manager *obj;
 
s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", con->features);
 
for (i = 0; i < ras_block_count; i++) {
head.block = i;
+   enabled = amdgpu_ras_is_feature_enabled(adev, &head);
 
-   if (amdgpu_ras_is_feature_enabled(adev, &head)) {
-   obj = amdgpu_ras_find_obj(adev, &head);
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: %s\n",
-   ras_block_str(i),
-   ras_err_str(obj->head.type));
-   } else
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: disabled\n",
-   ras_block_str(i));
+   s += scnprintf(&buf[s], PAGE_SIZE - s,
+   "%s ras feature mask: %s\n",
+   ras_block_str(i), enabled?"on":"off");
}
 
return s;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Christian König

Hi Tao,

yeah Kent is absolutely right. A must have requirement for sysfs is a 
stable interface.


But there are a couple of other violations for sysfs rules as well:

s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", con->features);
  
  	for (i = 0; i < ras_block_count; i++) {
A must have for sysfs is one value per file. So printing a feature mask 
and then multiple feature values is not really correct.


Additional to that sysfs is design for tool interfacing, not for user 
interfacing. So printing text like "feature xy is enabled/disable" is 
not very convenient either.


What you should do is to create files like amdgpu_ras_feature_xy and 
then return 0/1 based on if the feature is enabled or not.


Regards,
Christian.

Am 08.08.19 um 14:51 schrieb Russell, Kent:

+Chris Freehill

While I can understand this change, this broke our SMI interface, which was 
expecting a specific string format for the ras/features file. This has happened 
a few times now, where changes to the RAS sysfs files has broke the SMI CLI 
and/or SMI LIB. Can we please get a stable interface and sysfs format set up 
before publishing patches? This is creating a lot of extra work for developers 
with the SMI to constantly keep up with the changes being made to sysfs files. 
Thank you.

  Kent

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Monday, August 5, 2019 4:15 AM
To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; Pan, Xinhui 

Cc: Zhou1, Tao 
Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Tao Zhou
Sent: 2019年8月5日 16:04
To: amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Zhang, Hawking 

Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: update ras sysfs feature info

remove confused ras error type info

Signed-off-by: Tao Zhou 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +
  1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d2e8a85f6e38..369651247b23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -787,25 +787,18 @@ static ssize_t amdgpu_ras_sysfs_features_read(struct 
device *dev,
struct amdgpu_device *adev = ddev->dev_private;
struct ras_common_if head;
int ras_block_count = AMDGPU_RAS_BLOCK_COUNT;
-   int i;
+   int i, enabled;
ssize_t s;
-   struct ras_manager *obj;
  
  	s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", con->features);
  
  	for (i = 0; i < ras_block_count; i++) {

head.block = i;
+   enabled = amdgpu_ras_is_feature_enabled(adev, &head);
  
-		if (amdgpu_ras_is_feature_enabled(adev, &head)) {

-   obj = amdgpu_ras_find_obj(adev, &head);
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: %s\n",
-   ras_block_str(i),
-   ras_err_str(obj->head.type));
-   } else
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: disabled\n",
-   ras_block_str(i));
+   s += scnprintf(&buf[s], PAGE_SIZE - s,
+   "%s ras feature mask: %s\n",
+   ras_block_str(i), enabled?"on":"off");
}
  
  	return s;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-08 Thread Russell, Kent
+Chris Freehill

While I can understand this change, this broke our SMI interface, which was 
expecting a specific string format for the ras/features file. This has happened 
a few times now, where changes to the RAS sysfs files has broke the SMI CLI 
and/or SMI LIB. Can we please get a stable interface and sysfs format set up 
before publishing patches? This is creating a lot of extra work for developers 
with the SMI to constantly keep up with the changes being made to sysfs files. 
Thank you.

 Kent

-Original Message-
From: amd-gfx  On Behalf Of Zhang, 
Hawking
Sent: Monday, August 5, 2019 4:15 AM
To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; Pan, Xinhui 

Cc: Zhou1, Tao 
Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Tao Zhou
Sent: 2019年8月5日 16:04
To: amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Zhang, 
Hawking 
Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: update ras sysfs feature info

remove confused ras error type info

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d2e8a85f6e38..369651247b23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -787,25 +787,18 @@ static ssize_t amdgpu_ras_sysfs_features_read(struct 
device *dev,
struct amdgpu_device *adev = ddev->dev_private;
struct ras_common_if head;
int ras_block_count = AMDGPU_RAS_BLOCK_COUNT;
-   int i;
+   int i, enabled;
ssize_t s;
-   struct ras_manager *obj;
 
s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", con->features);
 
for (i = 0; i < ras_block_count; i++) {
head.block = i;
+   enabled = amdgpu_ras_is_feature_enabled(adev, &head);
 
-   if (amdgpu_ras_is_feature_enabled(adev, &head)) {
-   obj = amdgpu_ras_find_obj(adev, &head);
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: %s\n",
-   ras_block_str(i),
-   ras_err_str(obj->head.type));
-   } else
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: disabled\n",
-   ras_block_str(i));
+   s += scnprintf(&buf[s], PAGE_SIZE - s,
+   "%s ras feature mask: %s\n",
+   ras_block_str(i), enabled?"on":"off");
}
 
return s;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-05 Thread Zhang, Hawking
Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Tao Zhou
Sent: 2019年8月5日 16:04
To: amd-gfx@lists.freedesktop.org; Pan, Xinhui ; Zhang, 
Hawking 
Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdgpu: update ras sysfs feature info

remove confused ras error type info

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d2e8a85f6e38..369651247b23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -787,25 +787,18 @@ static ssize_t amdgpu_ras_sysfs_features_read(struct 
device *dev,
struct amdgpu_device *adev = ddev->dev_private;
struct ras_common_if head;
int ras_block_count = AMDGPU_RAS_BLOCK_COUNT;
-   int i;
+   int i, enabled;
ssize_t s;
-   struct ras_manager *obj;
 
s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", con->features);
 
for (i = 0; i < ras_block_count; i++) {
head.block = i;
+   enabled = amdgpu_ras_is_feature_enabled(adev, &head);
 
-   if (amdgpu_ras_is_feature_enabled(adev, &head)) {
-   obj = amdgpu_ras_find_obj(adev, &head);
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: %s\n",
-   ras_block_str(i),
-   ras_err_str(obj->head.type));
-   } else
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: disabled\n",
-   ras_block_str(i));
+   s += scnprintf(&buf[s], PAGE_SIZE - s,
+   "%s ras feature mask: %s\n",
+   ras_block_str(i), enabled?"on":"off");
}
 
return s;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: update ras sysfs feature info

2019-08-05 Thread Tao Zhou
remove confused ras error type info

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d2e8a85f6e38..369651247b23 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -787,25 +787,18 @@ static ssize_t amdgpu_ras_sysfs_features_read(struct 
device *dev,
struct amdgpu_device *adev = ddev->dev_private;
struct ras_common_if head;
int ras_block_count = AMDGPU_RAS_BLOCK_COUNT;
-   int i;
+   int i, enabled;
ssize_t s;
-   struct ras_manager *obj;
 
s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", con->features);
 
for (i = 0; i < ras_block_count; i++) {
head.block = i;
+   enabled = amdgpu_ras_is_feature_enabled(adev, &head);
 
-   if (amdgpu_ras_is_feature_enabled(adev, &head)) {
-   obj = amdgpu_ras_find_obj(adev, &head);
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: %s\n",
-   ras_block_str(i),
-   ras_err_str(obj->head.type));
-   } else
-   s += scnprintf(&buf[s], PAGE_SIZE - s,
-   "%s: disabled\n",
-   ras_block_str(i));
+   s += scnprintf(&buf[s], PAGE_SIZE - s,
+   "%s ras feature mask: %s\n",
+   ras_block_str(i), enabled?"on":"off");
}
 
return s;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx