[PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-12 Thread Luben Tuikov
Some ASIC support low-power functionality for the whole ASIC or just
an IP block. When in such low-power mode, some sysfs interfaces would
report a frequency of 0, e.g.,

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz 
1: 0Mhz *
2: 2200Mhz 
$_

An operating frequency of 0 MHz doesn't make sense, and this interface
is designed to report only operating clock frequencies, i.e. non-zero,
and possibly the current one.

When in this low-power state, round to the smallest
operating frequency, for this interface, as follows,

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz *
1: 2200Mhz 
$_

Luben Tuikov (5):
  drm/amd/pm: Slight function rename
  drm/amd/pm: Rename cur_value to curr_value
  drm/amd/pm: Rename freq_values --> freq_value
  dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
  dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency

 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
 .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ---
 2 files changed, 86 insertions(+), 47 deletions(-)

-- 
2.33.1.558.g2bd2f258f4



Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-12 Thread Lazar, Lijo




On 10/13/2021 8:40 AM, Luben Tuikov wrote:

Some ASIC support low-power functionality for the whole ASIC or just
an IP block. When in such low-power mode, some sysfs interfaces would
report a frequency of 0, e.g.,

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz
1: 0Mhz *
2: 2200Mhz
$_

An operating frequency of 0 MHz doesn't make sense, and this interface
is designed to report only operating clock frequencies, i.e. non-zero,
and possibly the current one.

When in this low-power state, round to the smallest
operating frequency, for this interface, as follows,



Would rather avoid this -

1) It is manipulating FW reported value. If at all there is an uncaught 
issue in FW reporting of frequency values, that is masked here.
2) Otherwise, if 0MHz is described as GFX power gated case, this 
provides a convenient interface to check if GFX is power gated.


If seeing a '0' is not pleasing, consider changing to something like
"NA" - not available (frequency cannot be fetched at the moment).

Thanks,
Lijo


$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz *
1: 2200Mhz
$_

Luben Tuikov (5):
   drm/amd/pm: Slight function rename
   drm/amd/pm: Rename cur_value to curr_value
   drm/amd/pm: Rename freq_values --> freq_value
   dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
   dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency

  .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
  .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ---
  2 files changed, 86 insertions(+), 47 deletions(-)



RE: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-13 Thread Quan, Evan
[AMD Official Use Only]

I agree with Lijo. 
Reporting a "round to the smallest operating frequency" just makes user more 
confusing. 
As per designed, the frequency marked with "*" should reflect the current clock 
frequency.

BR
Evan
> -Original Message-
> From: amd-gfx  On Behalf Of
> Lazar, Lijo
> Sent: Wednesday, October 13, 2021 12:14 PM
> To: Tuikov, Luben ; amd-
> g...@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency
> 
> 
> 
> On 10/13/2021 8:40 AM, Luben Tuikov wrote:
> > Some ASIC support low-power functionality for the whole ASIC or just
> > an IP block. When in such low-power mode, some sysfs interfaces would
> > report a frequency of 0, e.g.,
> >
> > $cat /sys/class/drm/card0/device/pp_dpm_sclk
> > 0: 500Mhz
> > 1: 0Mhz *
> > 2: 2200Mhz
> > $_
> >
> > An operating frequency of 0 MHz doesn't make sense, and this interface
> > is designed to report only operating clock frequencies, i.e. non-zero,
> > and possibly the current one.
> >
> > When in this low-power state, round to the smallest operating
> > frequency, for this interface, as follows,
> >
> 
> Would rather avoid this -
> 
> 1) It is manipulating FW reported value. If at all there is an uncaught issue 
> in
> FW reporting of frequency values, that is masked here.
> 2) Otherwise, if 0MHz is described as GFX power gated case, this provides a
> convenient interface to check if GFX is power gated.
> 
> If seeing a '0' is not pleasing, consider changing to something like
>   "NA" - not available (frequency cannot be fetched at the moment).
> 
> Thanks,
> Lijo
> 
> > $cat /sys/class/drm/card0/device/pp_dpm_sclk
> > 0: 500Mhz *
> > 1: 2200Mhz
> > $_
> >
> > Luben Tuikov (5):
> >drm/amd/pm: Slight function rename
> >drm/amd/pm: Rename cur_value to curr_value
> >drm/amd/pm: Rename freq_values --> freq_value
> >dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
> >dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency
> >
> >   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
> >   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 -
> --
> >   2 files changed, 86 insertions(+), 47 deletions(-)
> >


Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-13 Thread Alex Deucher
On Wed, Oct 13, 2021 at 12:14 AM Lazar, Lijo  wrote:
>
>
>
> On 10/13/2021 8:40 AM, Luben Tuikov wrote:
> > Some ASIC support low-power functionality for the whole ASIC or just
> > an IP block. When in such low-power mode, some sysfs interfaces would
> > report a frequency of 0, e.g.,
> >
> > $cat /sys/class/drm/card0/device/pp_dpm_sclk
> > 0: 500Mhz
> > 1: 0Mhz *
> > 2: 2200Mhz
> > $_
> >
> > An operating frequency of 0 MHz doesn't make sense, and this interface
> > is designed to report only operating clock frequencies, i.e. non-zero,
> > and possibly the current one.
> >
> > When in this low-power state, round to the smallest
> > operating frequency, for this interface, as follows,
> >
>
> Would rather avoid this -
>
> 1) It is manipulating FW reported value. If at all there is an uncaught
> issue in FW reporting of frequency values, that is masked here.
> 2) Otherwise, if 0MHz is described as GFX power gated case, this
> provides a convenient interface to check if GFX is power gated.
>
> If seeing a '0' is not pleasing, consider changing to something like
> "NA" - not available (frequency cannot be fetched at the moment).
>

I don't think this interface is really supposed to be the way to get
the current clock, although some use it that way.  It's supposed to
show the DPM states and which are active.  conflating the interface
with other information confuses things in my opinion.  Why is the
current clock less than the minimum clock?  Whether or not an IP is
turned off or in deep sleep or not is independent of DPM states.  When
the IP is active, it will never go below the minimum DPM level.  If we
want to query deep sleep or gfxoff we can use the amdgpu_pm_info
debugfs interface or add some other new interface.

Alex


> Thanks,
> Lijo
>
> > $cat /sys/class/drm/card0/device/pp_dpm_sclk
> > 0: 500Mhz *
> > 1: 2200Mhz
> > $_
> >
> > Luben Tuikov (5):
> >drm/amd/pm: Slight function rename
> >drm/amd/pm: Rename cur_value to curr_value
> >drm/amd/pm: Rename freq_values --> freq_value
> >dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
> >dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency
> >
> >   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
> >   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ---
> >   2 files changed, 86 insertions(+), 47 deletions(-)
> >


Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-13 Thread Lazar, Lijo




On 10/13/2021 7:28 PM, Alex Deucher wrote:

On Wed, Oct 13, 2021 at 12:14 AM Lazar, Lijo  wrote:




On 10/13/2021 8:40 AM, Luben Tuikov wrote:

Some ASIC support low-power functionality for the whole ASIC or just
an IP block. When in such low-power mode, some sysfs interfaces would
report a frequency of 0, e.g.,

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz
1: 0Mhz *
2: 2200Mhz
$_

An operating frequency of 0 MHz doesn't make sense, and this interface
is designed to report only operating clock frequencies, i.e. non-zero,
and possibly the current one.

When in this low-power state, round to the smallest
operating frequency, for this interface, as follows,



Would rather avoid this -

1) It is manipulating FW reported value. If at all there is an uncaught
issue in FW reporting of frequency values, that is masked here.
2) Otherwise, if 0MHz is described as GFX power gated case, this
provides a convenient interface to check if GFX is power gated.

If seeing a '0' is not pleasing, consider changing to something like
 "NA" - not available (frequency cannot be fetched at the moment).



I don't think this interface is really supposed to be the way to get
the current clock, although some use it that way.  It's supposed to
show the DPM states and which are active.  conflating the interface
with other information confuses things in my opinion.  Why is the
current clock less than the minimum clock?  Whether or not an IP is
turned off or in deep sleep or not is independent of DPM states.  When
the IP is active, it will never go below the minimum DPM level.  If we
want to query deep sleep or gfxoff we can use the amdgpu_pm_info
debugfs interface or add some other new interface.



The idea of DPM level is deprecated with fine grained clock which 
provides only min and max. For fine grained clock, we fetch the current 
clock frequency and show it as an artificial DPM level between min/max. 
That itself should have confused users but it is not which means the 
users use the * to fetch the current frequency and not really bothered 
about the DPM level per se.


Also, some ASICs define 'min' as as the least possible freq (that 
happens during a throttle) and not the DPM level 0 min in the 
traditional sense (that is defined as idle frequency which doesn't come 
into min/max levels). It's usually from the idle frequency that GFX gets 
power gated. Showing a * against min in such cases would be confusing 
because that could be misinterpreted as a throttle scenario.


Thanks,
Lijo


Alex



Thanks,
Lijo


$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz *
1: 2200Mhz
$_

Luben Tuikov (5):
drm/amd/pm: Slight function rename
drm/amd/pm: Rename cur_value to curr_value
drm/amd/pm: Rename freq_values --> freq_value
dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency

   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ---
   2 files changed, 86 insertions(+), 47 deletions(-)



[PATCH 0/5] 0 MHz is not a valid current frequency (v2)

2021-10-13 Thread Luben Tuikov
Some ASIC support low-power functionality for the whole ASIC or just
an IP block. When in such low-power mode, some sysfs interfaces would
report a frequency of 0, e.g.,

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz 
1: 0Mhz *
2: 2200Mhz 
$_

An operating frequency of 0 MHz doesn't make sense, and this interface
is designed to report only operating clock frequencies, i.e. non-zero,
and possibly the current one.

When in this low-power state, round to the smallest
operating frequency, for this interface, as follows,

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz *
1: 2200Mhz 
$_

v2: Fix description to reflect change in patch 1--add an 's'.

Luben Tuikov (5):
  drm/amd/pm: Slight function rename (v2)
  drm/amd/pm: Rename cur_value to curr_value
  drm/amd/pm: Rename freq_values --> freq_value
  dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
  dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency

 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
 .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ---
 2 files changed, 86 insertions(+), 47 deletions(-)

-- 
2.33.1.558.g2bd2f258f4



Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-13 Thread Luben Tuikov
On 2021-10-13 00:14, Lazar, Lijo wrote:
>
> On 10/13/2021 8:40 AM, Luben Tuikov wrote:
>> Some ASIC support low-power functionality for the whole ASIC or just
>> an IP block. When in such low-power mode, some sysfs interfaces would
>> report a frequency of 0, e.g.,
>>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz
>> 1: 0Mhz *
>> 2: 2200Mhz
>> $_
>>
>> An operating frequency of 0 MHz doesn't make sense, and this interface
>> is designed to report only operating clock frequencies, i.e. non-zero,
>> and possibly the current one.
>>
>> When in this low-power state, round to the smallest
>> operating frequency, for this interface, as follows,
>>
> Would rather avoid this -
>
> 1) It is manipulating FW reported value. If at all there is an uncaught 
> issue in FW reporting of frequency values, that is masked here.
> 2) Otherwise, if 0MHz is described as GFX power gated case, this 
> provides a convenient interface to check if GFX is power gated.
>
> If seeing a '0' is not pleasing, consider changing to something like
>   "NA" - not available (frequency cannot be fetched at the moment).

There's a ROCm tool which literally asserts if the values are not ordered in 
increasing order. Now since 0 < 550, but 0 is listed as the second entry, the 
tool simply asserts and crashes.

It is not clear what you'd rather see here:

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: N/A *
2: 2200MHz
$_

Is this what you want to see? (That'll crash other tools which expect %uMhz.)

Or maybe just a list without default hint, i.e. no asterisk?

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: 2200MHz
$_

What should the output be?

We want to avoid showing 0, but still show numbers.

Regards,
Luben

>
> Thanks,
> Lijo
>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz *
>> 1: 2200Mhz
>> $_
>>
>> Luben Tuikov (5):
>>drm/amd/pm: Slight function rename
>>drm/amd/pm: Rename cur_value to curr_value
>>drm/amd/pm: Rename freq_values --> freq_value
>>dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
>>dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency
>>
>>   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
>>   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ---
>>   2 files changed, 86 insertions(+), 47 deletions(-)
>>



Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-13 Thread Lazar, Lijo
[AMD Official Use Only]

>Or maybe just a list without default hint, i.e. no asterisk?

I think this is also fine meaning we are having trouble in determining the 
current frequency or DPM level. Evan/Alex? Don't know if this will crash the 
tools.

Thanks,
Lijo

From: Tuikov, Luben 
Sent: Wednesday, October 13, 2021 9:52:09 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org 

Cc: Deucher, Alexander 
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency

On 2021-10-13 00:14, Lazar, Lijo wrote:
>
> On 10/13/2021 8:40 AM, Luben Tuikov wrote:
>> Some ASIC support low-power functionality for the whole ASIC or just
>> an IP block. When in such low-power mode, some sysfs interfaces would
>> report a frequency of 0, e.g.,
>>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz
>> 1: 0Mhz *
>> 2: 2200Mhz
>> $_
>>
>> An operating frequency of 0 MHz doesn't make sense, and this interface
>> is designed to report only operating clock frequencies, i.e. non-zero,
>> and possibly the current one.
>>
>> When in this low-power state, round to the smallest
>> operating frequency, for this interface, as follows,
>>
> Would rather avoid this -
>
> 1) It is manipulating FW reported value. If at all there is an uncaught
> issue in FW reporting of frequency values, that is masked here.
> 2) Otherwise, if 0MHz is described as GFX power gated case, this
> provides a convenient interface to check if GFX is power gated.
>
> If seeing a '0' is not pleasing, consider changing to something like
>"NA" - not available (frequency cannot be fetched at the moment).

There's a ROCm tool which literally asserts if the values are not ordered in 
increasing order. Now since 0 < 550, but 0 is listed as the second entry, the 
tool simply asserts and crashes.

It is not clear what you'd rather see here:

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: N/A *
2: 2200MHz
$_

Is this what you want to see? (That'll crash other tools which expect %uMhz.)

Or maybe just a list without default hint, i.e. no asterisk?

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: 2200MHz
$_

What should the output be?

We want to avoid showing 0, but still show numbers.

Regards,
Luben

>
> Thanks,
> Lijo
>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz *
>> 1: 2200Mhz
>> $_
>>
>> Luben Tuikov (5):
>>drm/amd/pm: Slight function rename
>>drm/amd/pm: Rename cur_value to curr_value
>>drm/amd/pm: Rename freq_values --> freq_value
>>dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
>>dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency
>>
>>   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
>>   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ---
>>   2 files changed, 86 insertions(+), 47 deletions(-)
>>



Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-13 Thread Alex Deucher
On Wed, Oct 13, 2021 at 1:06 PM Lazar, Lijo  wrote:
>
> [AMD Official Use Only]
>
>
> >Or maybe just a list without default hint, i.e. no asterisk?
>
> I think this is also fine meaning we are having trouble in determining the 
> current frequency or DPM level. Evan/Alex? Don't know if this will crash the 
> tools.
>

That seems reasonable to me.

Alex

> Thanks,
> Lijo
> 
> From: Tuikov, Luben 
> Sent: Wednesday, October 13, 2021 9:52:09 PM
> To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org 
> 
> Cc: Deucher, Alexander 
> Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency
>
> On 2021-10-13 00:14, Lazar, Lijo wrote:
> >
> > On 10/13/2021 8:40 AM, Luben Tuikov wrote:
> >> Some ASIC support low-power functionality for the whole ASIC or just
> >> an IP block. When in such low-power mode, some sysfs interfaces would
> >> report a frequency of 0, e.g.,
> >>
> >> $cat /sys/class/drm/card0/device/pp_dpm_sclk
> >> 0: 500Mhz
> >> 1: 0Mhz *
> >> 2: 2200Mhz
> >> $_
> >>
> >> An operating frequency of 0 MHz doesn't make sense, and this interface
> >> is designed to report only operating clock frequencies, i.e. non-zero,
> >> and possibly the current one.
> >>
> >> When in this low-power state, round to the smallest
> >> operating frequency, for this interface, as follows,
> >>
> > Would rather avoid this -
> >
> > 1) It is manipulating FW reported value. If at all there is an uncaught
> > issue in FW reporting of frequency values, that is masked here.
> > 2) Otherwise, if 0MHz is described as GFX power gated case, this
> > provides a convenient interface to check if GFX is power gated.
> >
> > If seeing a '0' is not pleasing, consider changing to something like
> >"NA" - not available (frequency cannot be fetched at the moment).
>
> There's a ROCm tool which literally asserts if the values are not ordered in 
> increasing order. Now since 0 < 550, but 0 is listed as the second entry, the 
> tool simply asserts and crashes.
>
> It is not clear what you'd rather see here:
>
> $cat /sys/class/drm/card0/device/pp_dpm_sclk
> 0: 550Mhz
> 1: N/A *
> 2: 2200MHz
> $_
>
> Is this what you want to see? (That'll crash other tools which expect %uMhz.)
>
> Or maybe just a list without default hint, i.e. no asterisk?
>
> $cat /sys/class/drm/card0/device/pp_dpm_sclk
> 0: 550Mhz
> 1: 2200MHz
> $_
>
> What should the output be?
>
> We want to avoid showing 0, but still show numbers.
>
> Regards,
> Luben
>
> >
> > Thanks,
> > Lijo
> >
> >> $cat /sys/class/drm/card0/device/pp_dpm_sclk
> >> 0: 500Mhz *
> >> 1: 2200Mhz
> >> $_
> >>
> >> Luben Tuikov (5):
> >>drm/amd/pm: Slight function rename
> >>drm/amd/pm: Rename cur_value to curr_value
> >>drm/amd/pm: Rename freq_values --> freq_value
> >>dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
> >>dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency
> >>
> >>   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
> >>   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ---
> >>   2 files changed, 86 insertions(+), 47 deletions(-)
> >>
>


[PATCH 0/5] 0 MHz is not a valid current frequency (v3)

2021-10-14 Thread Luben Tuikov
Some ASICs support low-power functionality for the whole ASIC or just
an IP block. When in such low-power mode, some sysfs interfaces would
report a frequency of 0, e.g.,

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz 
1: 0Mhz *
2: 2200Mhz 
$_

An operating frequency of 0 MHz doesn't make sense, and this interface
is designed to report only operating clock frequencies, i.e. non-zero,
and possibly the current one.

When in this low-power state, round to the smallest
operating frequency, for this interface, as follows,

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 500Mhz *
1: 2200Mhz 
$_

v2: Fix description to reflect change in patch 1--add an 's'.
v3: Don't tag a current if current is 0.

Luben Tuikov (5):
  drm/amd/pm: Slight function rename (v2)
  drm/amd/pm: Rename cur_value to curr_value
  drm/amd/pm: Rename freq_values --> freq_value
  dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency (v2)
  dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency (v2)

 .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 57 --
 .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 74 ---
 2 files changed, 83 insertions(+), 48 deletions(-)


base-commit: b81c53cdbe1482b1f4013ba7a41bca2174cde109
-- 
2.33.1.558.g2bd2f258f4



RE: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-14 Thread Quan, Evan
[AMD Official Use Only]

+Kent who maintains the Rocm tool

From: amd-gfx  On Behalf Of Lazar, Lijo
Sent: Thursday, October 14, 2021 1:07 AM
To: Tuikov, Luben ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency


[AMD Official Use Only]


[AMD Official Use Only]

>Or maybe just a list without default hint, i.e. no asterisk?

I think this is also fine meaning we are having trouble in determining the 
current frequency or DPM level. Evan/Alex? Don't know if this will crash the 
tools.

Thanks,
Lijo

From: Tuikov, Luben mailto:luben.tui...@amd.com>>
Sent: Wednesday, October 13, 2021 9:52:09 PM
To: Lazar, Lijo mailto:lijo.la...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency

On 2021-10-13 00:14, Lazar, Lijo wrote:
>
> On 10/13/2021 8:40 AM, Luben Tuikov wrote:
>> Some ASIC support low-power functionality for the whole ASIC or just
>> an IP block. When in such low-power mode, some sysfs interfaces would
>> report a frequency of 0, e.g.,
>>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz
>> 1: 0Mhz *
>> 2: 2200Mhz
>> $_
>>
>> An operating frequency of 0 MHz doesn't make sense, and this interface
>> is designed to report only operating clock frequencies, i.e. non-zero,
>> and possibly the current one.
>>
>> When in this low-power state, round to the smallest
>> operating frequency, for this interface, as follows,
>>
> Would rather avoid this -
>
> 1) It is manipulating FW reported value. If at all there is an uncaught
> issue in FW reporting of frequency values, that is masked here.
> 2) Otherwise, if 0MHz is described as GFX power gated case, this
> provides a convenient interface to check if GFX is power gated.
>
> If seeing a '0' is not pleasing, consider changing to something like
>"NA" - not available (frequency cannot be fetched at the moment).

There's a ROCm tool which literally asserts if the values are not ordered in 
increasing order. Now since 0 < 550, but 0 is listed as the second entry, the 
tool simply asserts and crashes.

It is not clear what you'd rather see here:

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: N/A *
2: 2200MHz
$_

Is this what you want to see? (That'll crash other tools which expect %uMhz.)

Or maybe just a list without default hint, i.e. no asterisk?

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: 2200MHz
$_

What should the output be?

We want to avoid showing 0, but still show numbers.

Regards,
Luben

>
> Thanks,
> Lijo
>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz *
>> 1: 2200Mhz
>> $_
>>
>> Luben Tuikov (5):
>>drm/amd/pm: Slight function rename
>>drm/amd/pm: Rename cur_value to curr_value
>>drm/amd/pm: Rename freq_values --> freq_value
>>dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
>>dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency
>>
>>   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
>>   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ---
>>   2 files changed, 86 insertions(+), 47 deletions(-)
>>


Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-18 Thread Deucher, Alexander
[Public]

We the current behavior (0 for clock) already crashes the tool, so I don't 
think we can really make things worse.

Alex


From: Quan, Evan 
Sent: Thursday, October 14, 2021 10:25 PM
To: Lazar, Lijo ; Tuikov, Luben ; 
amd-gfx@lists.freedesktop.org ; Russell, Kent 

Cc: Deucher, Alexander 
Subject: RE: [PATCH 0/5] 0 MHz is not a valid current frequency


[AMD Official Use Only]



+Kent who maintains the Rocm tool



From: amd-gfx  On Behalf Of Lazar, Lijo
Sent: Thursday, October 14, 2021 1:07 AM
To: Tuikov, Luben ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency



[AMD Official Use Only]



[AMD Official Use Only]



>Or maybe just a list without default hint, i.e. no asterisk?



I think this is also fine meaning we are having trouble in determining the 
current frequency or DPM level. Evan/Alex? Don't know if this will crash the 
tools.



Thanks,
Lijo



From: Tuikov, Luben mailto:luben.tui...@amd.com>>
Sent: Wednesday, October 13, 2021 9:52:09 PM
To: Lazar, Lijo mailto:lijo.la...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency



On 2021-10-13 00:14, Lazar, Lijo wrote:
>
> On 10/13/2021 8:40 AM, Luben Tuikov wrote:
>> Some ASIC support low-power functionality for the whole ASIC or just
>> an IP block. When in such low-power mode, some sysfs interfaces would
>> report a frequency of 0, e.g.,
>>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz
>> 1: 0Mhz *
>> 2: 2200Mhz
>> $_
>>
>> An operating frequency of 0 MHz doesn't make sense, and this interface
>> is designed to report only operating clock frequencies, i.e. non-zero,
>> and possibly the current one.
>>
>> When in this low-power state, round to the smallest
>> operating frequency, for this interface, as follows,
>>
> Would rather avoid this -
>
> 1) It is manipulating FW reported value. If at all there is an uncaught
> issue in FW reporting of frequency values, that is masked here.
> 2) Otherwise, if 0MHz is described as GFX power gated case, this
> provides a convenient interface to check if GFX is power gated.
>
> If seeing a '0' is not pleasing, consider changing to something like
>"NA" - not available (frequency cannot be fetched at the moment).

There's a ROCm tool which literally asserts if the values are not ordered in 
increasing order. Now since 0 < 550, but 0 is listed as the second entry, the 
tool simply asserts and crashes.

It is not clear what you'd rather see here:

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: N/A *
2: 2200MHz
$_

Is this what you want to see? (That'll crash other tools which expect %uMhz.)

Or maybe just a list without default hint, i.e. no asterisk?

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: 2200MHz
$_

What should the output be?

We want to avoid showing 0, but still show numbers.

Regards,
Luben

>
> Thanks,
> Lijo
>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz *
>> 1: 2200Mhz
>> $_
>>
>> Luben Tuikov (5):
>>drm/amd/pm: Slight function rename
>>drm/amd/pm: Rename cur_value to curr_value
>>drm/amd/pm: Rename freq_values --> freq_value
>>dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
>>dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency
>>
>>   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
>>   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ---
>>   2 files changed, 86 insertions(+), 47 deletions(-)
>>


Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-18 Thread Luben Tuikov

  
I think Kent is already seen these
  patches as he did comment on 1/5 patch.
  
  The v3 version of the patch, posted last week, removes the
  asterisk to report the lowest frequency as the current frequency,
  when the current frequency is 0, i.e. when the block is in low
  power state. Does the tool rely on the asterisk? If this
  information is necessary could it not use amdgpu_pm_info?
  
  Regards,
  Luben
  
  On 2021-10-18 16:19, Deucher, Alexander wrote:


  
  
  
[Public]
  
  
  

  We the current behavior (0 for clock) already crashes the
  tool, so I don't think we can really make things worse.

  


  Alex

  


From:
Quan, Evan 
Sent: Thursday, October 14, 2021 10:25 PM
To: Lazar, Lijo ; Tuikov,
Luben ;
amd-gfx@lists.freedesktop.org
; Russell, Kent

Cc: Deucher, Alexander

Subject: RE: [PATCH 0/5] 0 MHz is not a valid current
    frequency
   



  
[AMD
Official Use Only]
 
+Kent who maintains the Rocm tool
 

  

  From: amd-gfx

On Behalf Of Lazar, Lijo
Sent: Thursday, October 14, 2021 1:07 AM
To: Tuikov, Luben
;
amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander

Subject: Re: [PATCH 0/5] 0 MHz is not a valid
    current frequency

  
   
  [AMD
  Official Use Only]
   
  
[AMD Official Use Only]
 

  
>Or maybe just a list
without default hint, i.e. no asterisk?
  
  
 
  
  
I think this is also fine
meaning we are having trouble in determining the
current frequency or DPM level. Evan/Alex? Don't
know if this will crash the tools.
  
  

   

Thanks,
  Lijo
  
  

  
  
From: Tuikov, Luben <luben.tui...@amd.com>
Sent: Wednesday, October 13, 2021 9:52:09
PM
To: Lazar, Lijo <lijo.la...@amd.com>;
amd-gfx@lists.freedesktop.org
<amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander <alexander.deuc...@amd.com>
                    Subject: Re: [PATCH 0/5] 0 MHz is not a
    valid current frequency 

   

  
  

  On 2021-10-13
00:14, Lazar, Lijo wrote:
>
> On 10/13/2021 8:40 AM, Luben Tuikov wrote:
>> Some ASIC support low-power
functionality for the whole ASIC or just
>> an IP block. When in such low-power
mode, some sysfs interfaces would
>> report a frequency of 0, e.g.,
>>
>> $cat
/sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz
>> 1: 0Mhz *
>> 2: 2200Mhz
>> $_
>>
>> An operating frequency of 0 MHz doesn't
make sense, and this interface
>> is designed to report only operating
clock frequencies, i.e. non-zero,
>> and possibly the current one.
>>
>> When in this low-power state, round to
the smallest
>> operating frequency, for this
interface, as follows,
>>
> Would rather avoid t

RE: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-18 Thread Russell, Kent
[AMD Official Use Only]

+Harish, rocm-smi falls under his purview now.

Kent

From: Tuikov, Luben 
Sent: Monday, October 18, 2021 4:30 PM
To: Deucher, Alexander ; Quan, Evan 
; Lazar, Lijo ; 
amd-gfx@lists.freedesktop.org; Russell, Kent 
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency

I think Kent is already seen these patches as he did comment on 1/5 patch.

The v3 version of the patch, posted last week, removes the asterisk to report 
the lowest frequency as the current frequency, when the current frequency is 0, 
i.e. when the block is in low power state. Does the tool rely on the asterisk? 
If this information is necessary could it not use amdgpu_pm_info?

Regards,
Luben

On 2021-10-18 16:19, Deucher, Alexander wrote:

[Public]

We the current behavior (0 for clock) already crashes the tool, so I don't 
think we can really make things worse.

Alex


From: Quan, Evan <mailto:evan.q...@amd.com>
Sent: Thursday, October 14, 2021 10:25 PM
To: Lazar, Lijo <mailto:lijo.la...@amd.com>; Tuikov, Luben 
<mailto:luben.tui...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>; Russell, 
Kent <mailto:kent.russ...@amd.com>
Cc: Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Subject: RE: [PATCH 0/5] 0 MHz is not a valid current frequency


[AMD Official Use Only]



+Kent who maintains the Rocm tool



From: amd-gfx 
<mailto:amd-gfx-boun...@lists.freedesktop.org>
 On Behalf Of Lazar, Lijo
Sent: Thursday, October 14, 2021 1:07 AM
To: Tuikov, Luben <mailto:luben.tui...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency



[AMD Official Use Only]



[AMD Official Use Only]



>Or maybe just a list without default hint, i.e. no asterisk?



I think this is also fine meaning we are having trouble in determining the 
current frequency or DPM level. Evan/Alex? Don't know if this will crash the 
tools.



Thanks,
Lijo



From: Tuikov, Luben mailto:luben.tui...@amd.com>>
Sent: Wednesday, October 13, 2021 9:52:09 PM
To: Lazar, Lijo mailto:lijo.la...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency



On 2021-10-13 00:14, Lazar, Lijo wrote:
>
> On 10/13/2021 8:40 AM, Luben Tuikov wrote:
>> Some ASIC support low-power functionality for the whole ASIC or just
>> an IP block. When in such low-power mode, some sysfs interfaces would
>> report a frequency of 0, e.g.,
>>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz
>> 1: 0Mhz *
>> 2: 2200Mhz
>> $_
>>
>> An operating frequency of 0 MHz doesn't make sense, and this interface
>> is designed to report only operating clock frequencies, i.e. non-zero,
>> and possibly the current one.
>>
>> When in this low-power state, round to the smallest
>> operating frequency, for this interface, as follows,
>>
> Would rather avoid this -
>
> 1) It is manipulating FW reported value. If at all there is an uncaught
> issue in FW reporting of frequency values, that is masked here.
> 2) Otherwise, if 0MHz is described as GFX power gated case, this
> provides a convenient interface to check if GFX is power gated.
>
> If seeing a '0' is not pleasing, consider changing to something like
>"NA" - not available (frequency cannot be fetched at the moment).

There's a ROCm tool which literally asserts if the values are not ordered in 
increasing order. Now since 0 < 550, but 0 is listed as the second entry, the 
tool simply asserts and crashes.

It is not clear what you'd rather see here:

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: N/A *
2: 2200MHz
$_

Is this what you want to see? (That'll crash other tools which expect %uMhz.)

Or maybe just a list without default hint, i.e. no asterisk?

$cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 550Mhz
1: 2200MHz
$_

What should the output be?

We want to avoid showing 0, but still show numbers.

Regards,
Luben

>
> Thanks,
> Lijo
>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz *
>> 1: 2200Mhz
>> $_
>>
>> Luben Tuikov (5):
>>drm/amd/pm: Slight function rename
>>drm/amd/pm: Rename cur_value to curr_value
>>drm/amd/pm: Rename freq_values --> freq_value
>>dpm/amd/pm: Sienna: 0 MHz is not a current clock frequency
>>dpm/amd/pm: Navi10: 0 MHz is not a current clock frequency
>>
>>   .../gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c   | 60 +--
>>   .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 73 ---
>>   2 files changed, 86 insertions(+), 47 deletions(-)
>>



RE: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-18 Thread Russell, Kent
[AMD Official Use Only]

The * is required for the rocm-smi's functionality for showing what the current 
clocks are. We had a bug before where the * was removed, then the SMI died 
fantastically. Work could be done to try to handle that type of situation, but 
the SMI has a "show current clocks" and uses the * to determine which one is 
active

Kent

From: amd-gfx  On Behalf Of Russell, Kent
Sent: Monday, October 18, 2021 9:05 PM
To: Tuikov, Luben ; Deucher, Alexander 
; Quan, Evan ; Lazar, Lijo 
; amd-gfx@lists.freedesktop.org
Cc: Kasiviswanathan, Harish 
Subject: RE: [PATCH 0/5] 0 MHz is not a valid current frequency


[AMD Official Use Only]

+Harish, rocm-smi falls under his purview now.

Kent

From: Tuikov, Luben mailto:luben.tui...@amd.com>>
Sent: Monday, October 18, 2021 4:30 PM
To: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; Quan, Evan 
mailto:evan.q...@amd.com>>; Lazar, Lijo 
mailto:lijo.la...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Russell, 
Kent mailto:kent.russ...@amd.com>>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency

I think Kent is already seen these patches as he did comment on 1/5 patch.

The v3 version of the patch, posted last week, removes the asterisk to report 
the lowest frequency as the current frequency, when the current frequency is 0, 
i.e. when the block is in low power state. Does the tool rely on the asterisk? 
If this information is necessary could it not use amdgpu_pm_info?

Regards,
Luben

On 2021-10-18 16:19, Deucher, Alexander wrote:

[Public]

We the current behavior (0 for clock) already crashes the tool, so I don't 
think we can really make things worse.

Alex


From: Quan, Evan <mailto:evan.q...@amd.com>
Sent: Thursday, October 14, 2021 10:25 PM
To: Lazar, Lijo <mailto:lijo.la...@amd.com>; Tuikov, Luben 
<mailto:luben.tui...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>; Russell, 
Kent <mailto:kent.russ...@amd.com>
Cc: Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Subject: RE: [PATCH 0/5] 0 MHz is not a valid current frequency


[AMD Official Use Only]



+Kent who maintains the Rocm tool



From: amd-gfx 
<mailto:amd-gfx-boun...@lists.freedesktop.org>
 On Behalf Of Lazar, Lijo
Sent: Thursday, October 14, 2021 1:07 AM
To: Tuikov, Luben <mailto:luben.tui...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency



[AMD Official Use Only]



[AMD Official Use Only]



>Or maybe just a list without default hint, i.e. no asterisk?



I think this is also fine meaning we are having trouble in determining the 
current frequency or DPM level. Evan/Alex? Don't know if this will crash the 
tools.



Thanks,
Lijo



From: Tuikov, Luben mailto:luben.tui...@amd.com>>
Sent: Wednesday, October 13, 2021 9:52:09 PM
To: Lazar, Lijo mailto:lijo.la...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency



On 2021-10-13 00:14, Lazar, Lijo wrote:
>
> On 10/13/2021 8:40 AM, Luben Tuikov wrote:
>> Some ASIC support low-power functionality for the whole ASIC or just
>> an IP block. When in such low-power mode, some sysfs interfaces would
>> report a frequency of 0, e.g.,
>>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz
>> 1: 0Mhz *
>> 2: 2200Mhz
>> $_
>>
>> An operating frequency of 0 MHz doesn't make sense, and this interface
>> is designed to report only operating clock frequencies, i.e. non-zero,
>> and possibly the current one.
>>
>> When in this low-power state, round to the smallest
>> operating frequency, for this interface, as follows,
>>
> Would rather avoid this -
>
> 1) It is manipulating FW reported value. If at all there is an uncaught
> issue in FW reporting of frequency values, that is masked here.
> 2) Otherwise, if 0MHz is described as GFX power gated case, this
> provides a convenient interface to check if GFX is power gated.
>
> If seeing a '0' is not pleasing, consider changing to something like
>"NA" - not available (frequency cannot be fetched at the moment).

There's a ROCm tool which literally asserts if the values are not ordered in 
increasing order. Now since 0 < 550, but 0 is listed as the second entry, the 
tool simply asserts and crashes.

It is not clear what you'

Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-18 Thread Luben Tuikov

  
Kent,
  
  What is the command which fails?
  I can try to duplicate it here.
  
  So far, things I've tried, I cannot make rocm-smi fail. Command
  arguments?
  
  Regards,
  Luben
  
  On 2021-10-18 21:06, Russell, Kent wrote:


  
  
  
  
  
  
  
  
  
[AMD
Official Use Only]
 
The
* is required for the rocm-smi’s functionality for showing
what the current clocks are. We had a bug before where the *
was removed, then the SMI died fantastically. Work could be
done to try to handle that type of situation, but the SMI
has a “show current clocks” and uses the * to determine
which one is active
 
Kent
 

  

  From: amd-gfx
  
  On Behalf Of Russell, Kent
  Sent: Monday, October 18, 2021 9:05 PM
  To: Tuikov, Luben ;
  Deucher, Alexander ;
  Quan, Evan ; Lazar, Lijo
  ;
  amd-gfx@lists.freedesktop.org
  Cc: Kasiviswanathan, Harish
  
  Subject: RE: [PATCH 0/5] 0 MHz is not a valid
  current frequency

  
   
  [AMD
  Official Use Only]
   
  +Harish,
  rocm-smi falls under his purview now.
   
  Kent
   
  

  
From: Tuikov, Luben <luben.tui...@amd.com>

Sent: Monday, October 18, 2021 4:30 PM
To: Deucher, Alexander <alexander.deuc...@amd.com>;
Quan, Evan <evan.q...@amd.com>;
Lazar, Lijo <lijo.la...@amd.com>;
amd-gfx@lists.freedesktop.org;
Russell, Kent <kent.russ...@amd.com>
                Subject: Re: [PATCH 0/5] 0 MHz is not a valid
    current frequency
  

 

  I think Kent is already seen these
  patches as he did comment on 1/5 patch.
  
  The v3 version of the patch, posted last week, removes
  the asterisk to report the lowest frequency as the
  current frequency, when the current frequency is 0,
  i.e. when the block is in low power state. Does the
  tool rely on the asterisk? If this information is
  necessary could it not use amdgpu_pm_info?
  
  Regards,
  Luben
  
  On 2021-10-18 16:19, Deucher, Alexander wrote:


  [Public]
   
  

  We the current
  behavior (0 for clock) already crashes the tool,
  so I don't think we can really make things worse.


   


  Alex


   



  

  From: Quan, Evan
  
  Sent: Thursday, October 14, 2021 10:25 PM
  To: Lazar, Lijo ;
  Tuikov, Luben
  ;
  
amd-gfx@lists.freedesktop.org 
;
  Russell, Kent 

  Cc: Deucher, Alexander 
                      Subject: RE: [PATCH 0/5] 0 MHz is not a
  valid current frequency
  
  
 
  


  
[AMD
Official Use Only]
 
+Kent who maintains the Rocm
  tool
 

  

  From:
amd-gfx 
 On Behalf Of Lazar,
Lijo
Sent: Thursday, October 14, 2021 1:07
AM
To: Tuikov, Luben ;
amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
                        Subject: Re: [PATCH 0/5] 0 MHz is not
        a valid current frequency

  
   
  [AMD
   

RE: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-19 Thread Russell, Kent
[AMD Official Use Only]

It was the rocm-smi -c flag. Maybe some work was done to make it more robust, 
that would be nice. But the -c flag is supposed to show the current frequency 
for each clock type. -g would do the same, but just for SCLK.

Kent

From: Tuikov, Luben 
Sent: Tuesday, October 19, 2021 12:27 AM
To: Russell, Kent ; Deucher, Alexander 
; Quan, Evan ; Lazar, Lijo 
; amd-gfx@lists.freedesktop.org
Cc: Kasiviswanathan, Harish 
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency

Kent,

What is the command which fails?
I can try to duplicate it here.

So far, things I've tried, I cannot make rocm-smi fail. Command arguments?

Regards,
Luben

On 2021-10-18 21:06, Russell, Kent wrote:

[AMD Official Use Only]

The * is required for the rocm-smi's functionality for showing what the current 
clocks are. We had a bug before where the * was removed, then the SMI died 
fantastically. Work could be done to try to handle that type of situation, but 
the SMI has a "show current clocks" and uses the * to determine which one is 
active

Kent

From: amd-gfx 
<mailto:amd-gfx-boun...@lists.freedesktop.org>
 On Behalf Of Russell, Kent
Sent: Monday, October 18, 2021 9:05 PM
To: Tuikov, Luben <mailto:luben.tui...@amd.com>; Deucher, 
Alexander <mailto:alexander.deuc...@amd.com>; Quan, 
Evan <mailto:evan.q...@amd.com>; Lazar, Lijo 
<mailto:lijo.la...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Kasiviswanathan, Harish 
<mailto:harish.kasiviswanat...@amd.com>
Subject: RE: [PATCH 0/5] 0 MHz is not a valid current frequency


[AMD Official Use Only]

+Harish, rocm-smi falls under his purview now.

Kent

From: Tuikov, Luben mailto:luben.tui...@amd.com>>
Sent: Monday, October 18, 2021 4:30 PM
To: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; Quan, Evan 
mailto:evan.q...@amd.com>>; Lazar, Lijo 
mailto:lijo.la...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>; Russell, 
Kent mailto:kent.russ...@amd.com>>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency

I think Kent is already seen these patches as he did comment on 1/5 patch.

The v3 version of the patch, posted last week, removes the asterisk to report 
the lowest frequency as the current frequency, when the current frequency is 0, 
i.e. when the block is in low power state. Does the tool rely on the asterisk? 
If this information is necessary could it not use amdgpu_pm_info?

Regards,
Luben

On 2021-10-18 16:19, Deucher, Alexander wrote:

[Public]

We the current behavior (0 for clock) already crashes the tool, so I don't 
think we can really make things worse.

Alex


From: Quan, Evan <mailto:evan.q...@amd.com>
Sent: Thursday, October 14, 2021 10:25 PM
To: Lazar, Lijo <mailto:lijo.la...@amd.com>; Tuikov, Luben 
<mailto:luben.tui...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
<mailto:amd-gfx@lists.freedesktop.org>; Russell, 
Kent <mailto:kent.russ...@amd.com>
Cc: Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Subject: RE: [PATCH 0/5] 0 MHz is not a valid current frequency


[AMD Official Use Only]



+Kent who maintains the Rocm tool



From: amd-gfx 
<mailto:amd-gfx-boun...@lists.freedesktop.org>
 On Behalf Of Lazar, Lijo
Sent: Thursday, October 14, 2021 1:07 AM
To: Tuikov, Luben <mailto:luben.tui...@amd.com>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Deucher, Alexander 
<mailto:alexander.deuc...@amd.com>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency



[AMD Official Use Only]



[AMD Official Use Only]



>Or maybe just a list without default hint, i.e. no asterisk?



I think this is also fine meaning we are having trouble in determining the 
current frequency or DPM level. Evan/Alex? Don't know if this will crash the 
tools.



Thanks,
Lijo



From: Tuikov, Luben mailto:luben.tui...@amd.com>>
Sent: Wednesday, October 13, 2021 9:52:09 PM
To: Lazar, Lijo mailto:lijo.la...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency



On 2021-10-13 00:14, Lazar, Lijo wrote:
>
> On 10/13/2021 8:40 AM, Luben Tuikov wrote:
>> Some ASIC support low-power functionality for the whole ASIC or just
>> an IP block. When in such low-power mode, some sysfs interfaces would
>> report a frequency of 0, e.g.,
>>
>> $cat /sys/class/drm/card0/device/pp_dpm_sclk
>> 0: 500Mhz
>> 1: 0Mhz *
>> 2: 2200Mhz
>> $_
>>
>> An operating frequency of 0 MHz doesn't make sense,

Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-19 Thread Luben Tuikov

  
It again fails with the same message! 
  But this time it is different!
  Here's why:
  
  openat(AT_FDCWD,
"/sys/class/drm/card0/device/pp_dpm_fclk", O_RDONLY) = 3
read(3, "0: 571Mhz \n1: 1274Mhz *\n2: 1221M"...,
8191) = 36
read(3, "", 8191)   = 0
close(3)    = 0
write(2, "python3: /home/ltuikov/proj/amd/"..., 220python3:
/home/ltuikov/proj/amd/rocm_smi_lib/src/rocm_smi.cc:913:
rsmi_status_t get_frequencies(amd::smi::DevInfoTypes, uint32_t,
rsmi_frequencies_t*, uint32_t*): Assertion
`f->frequency[i-1] <= f->frequency[i]' failed.
) = 220
mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f531f9bc000
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid()    = 37861
gettid()    = 37861
tgkill(37861, 37861, SIGABRT)   = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=37861,
si_uid=1000} ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)
$cat
/sys/class/drm/card0/device/pp_dpm_fclk
  0: 571Mhz 
  1: 1274Mhz *
  2: 1221Mhz 
  $_

  Why is the mid frequency larger than the last?
  Why does get_frequencies() insists that they be ordered when
  they're not? (Does the tool need fixing or the kernel?)
  
  The current patchset doesn't report 0, and doesn't report any
  current if 0 would've been reported as current. But anything else
  is reported as it would've been reported before the patch. And I
  tested it with vanilla amd-staging-drm-next--same thing.
  
  Regards,
  Luben
  
  
  On 2021-10-19 09:25, Russell, Kent wrote:


  
  
  
  
  
  
  
  
[AMD
Official Use Only]
 
It
was the rocm-smi -c flag. Maybe some work was done to make
it more robust, that would be nice. But the -c flag is
supposed to show the current frequency for each clock type.
-g would do the same, but just for SCLK.
 
Kent
 

  

  From: Tuikov, Luben 
  
  Sent: Tuesday, October 19, 2021 12:27 AM
  To: Russell, Kent ;
  Deucher, Alexander ;
  Quan, Evan ;
  Lazar, Lijo ;
  amd-gfx@lists.freedesktop.org
                  Cc: Kasiviswanathan, Harish 
      Subject: Re: [PATCH 0/5] 0 MHz is not a valid
  current frequency

  
   
  
Kent,

What is the command which fails?
I can try to duplicate it here.

So far, things I've tried, I cannot make rocm-smi fail.
Command arguments?

Regards,
Luben

On 2021-10-18 21:06, Russell, Kent wrote:
  
  
[AMD
Official Use Only]
 
The
* is required for the rocm-smi’s functionality for
showing what the current clocks are. We had a bug before
where the * was removed, then the SMI died
fantastically. Work could be done to try to handle that
type of situation, but the SMI has a “show current
clocks” and uses the * to determine which one is active
 
Kent
 

  

  From: amd-gfx 
  On Behalf Of Russell, Kent
  Sent: Monday, October 18, 2021 9:05 PM
  To: Tuikov, Luben ;
  Deucher, Alexander ;
  Quan, Evan ;
  Lazar, Lijo 
; amd-gfx@lists.freedesktop.org
  Cc: Kasiviswanathan, Harish 

      Subject: RE: [PATCH 0/5] 0 MHz is not a
  valid current frequency

  
   
  [AMD
  Official Use Only]
   
  +Harish, rocm-smi falls under his purview
  now.
   
  Kent
   
  

  
From: Tuikov, Luben <luben.tui...@amd.

Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-26 Thread Alex Deucher
On Tue, Oct 19, 2021 at 9:54 AM Luben Tuikov  wrote:
>
> It again fails with the same message!
> But this time it is different!
> Here's why:
>
> openat(AT_FDCWD, "/sys/class/drm/card0/device/pp_dpm_fclk", O_RDONLY) = 3
> read(3, "0: 571Mhz \n1: 1274Mhz *\n2: 1221M"..., 8191) = 36
> read(3, "", 8191)   = 0
> close(3)= 0
> write(2, "python3: /home/ltuikov/proj/amd/"..., 220python3: 
> /home/ltuikov/proj/amd/rocm_smi_lib/src/rocm_smi.cc:913: rsmi_status_t 
> get_frequencies(amd::smi::DevInfoTypes, uint32_t, rsmi_frequencies_t*, 
> uint32_t*): Assertion `f->frequency[i-1] <= f->frequency[i]' failed.
> ) = 220
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
> 0x7f531f9bc000
> rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
> rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
> getpid()= 37861
> gettid()= 37861
> tgkill(37861, 37861, SIGABRT)   = 0
> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
> --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=37861, si_uid=1000} 
> ---
> +++ killed by SIGABRT (core dumped) +++
> Aborted (core dumped)
> $cat /sys/class/drm/card0/device/pp_dpm_fclk
> 0: 571Mhz
> 1: 1274Mhz *
> 2: 1221Mhz
> $_
>
> Why is the mid frequency larger than the last?
> Why does get_frequencies() insists that they be ordered when they're not? 
> (Does the tool need fixing or the kernel?)
>
> The current patchset doesn't report 0, and doesn't report any current if 0 
> would've been reported as current. But anything else is reported as it 
> would've been reported before the patch. And I tested it with vanilla 
> amd-staging-drm-next--same thing.
>

Seems to crash both ways.  I'd rather either:
1. Remove the * when the clock is outside of the min and max ranges
or
2. Clamp the clock to the max or min if it's above or below.

And then fix the tools accordingly.  Those seem like the choices of
least surprise considering the interface is supposed to show the
current and available DPM levels.

Alex


> Regards,
> Luben
>
>
> On 2021-10-19 09:25, Russell, Kent wrote:
>
> [AMD Official Use Only]
>
>
>
> It was the rocm-smi -c flag. Maybe some work was done to make it more robust, 
> that would be nice. But the -c flag is supposed to show the current frequency 
> for each clock type. -g would do the same, but just for SCLK.
>
>
>
> Kent
>
>
>
> From: Tuikov, Luben 
> Sent: Tuesday, October 19, 2021 12:27 AM
> To: Russell, Kent ; Deucher, Alexander 
> ; Quan, Evan ; Lazar, Lijo 
> ; amd-gfx@lists.freedesktop.org
> Cc: Kasiviswanathan, Harish 
> Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency
>
>
>
> Kent,
>
> What is the command which fails?
> I can try to duplicate it here.
>
> So far, things I've tried, I cannot make rocm-smi fail. Command arguments?
>
> Regards,
> Luben
>
> On 2021-10-18 21:06, Russell, Kent wrote:
>
> [AMD Official Use Only]
>
>
>
> The * is required for the rocm-smi’s functionality for showing what the 
> current clocks are. We had a bug before where the * was removed, then the SMI 
> died fantastically. Work could be done to try to handle that type of 
> situation, but the SMI has a “show current clocks” and uses the * to 
> determine which one is active
>
>
>
> Kent
>
>
>
> From: amd-gfx  On Behalf Of Russell, 
> Kent
> Sent: Monday, October 18, 2021 9:05 PM
> To: Tuikov, Luben ; Deucher, Alexander 
> ; Quan, Evan ; Lazar, Lijo 
> ; amd-gfx@lists.freedesktop.org
> Cc: Kasiviswanathan, Harish 
> Subject: RE: [PATCH 0/5] 0 MHz is not a valid current frequency
>
>
>
> [AMD Official Use Only]
>
>
>
> +Harish, rocm-smi falls under his purview now.
>
>
>
> Kent
>
>
>
> From: Tuikov, Luben 
> Sent: Monday, October 18, 2021 4:30 PM
> To: Deucher, Alexander ; Quan, Evan 
> ; Lazar, Lijo ; 
> amd-gfx@lists.freedesktop.org; Russell, Kent 
> Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency
>
>
>
> I think Kent is already seen these patches as he did comment on 1/5 patch.
>
> The v3 version of the patch, posted last week, removes the asterisk to report 
> the lowest frequency as the current frequency, when the current frequency is 
> 0, i.e. when the block is in low power state. Does the tool rely on the 
> asterisk? If this information is necessary could it not use amdgpu_pm_info?
>
> Regards,
> Luben
>
> On 2021-10-18 16:19, Deucher, Alexander wrote:
>
> [Public]
>
>
>
> We 

Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-26 Thread Luben Tuikov
On 2021-10-26 17:26, Alex Deucher wrote:
> On Tue, Oct 19, 2021 at 9:54 AM Luben Tuikov  wrote:
>> It again fails with the same message!
>> But this time it is different!
>> Here's why:
>>
>> openat(AT_FDCWD, "/sys/class/drm/card0/device/pp_dpm_fclk", O_RDONLY) = 3
>> read(3, "0: 571Mhz \n1: 1274Mhz *\n2: 1221M"..., 8191) = 36
>> read(3, "", 8191)   = 0
>> close(3)= 0
>> write(2, "python3: /home/ltuikov/proj/amd/"..., 220python3: 
>> /home/ltuikov/proj/amd/rocm_smi_lib/src/rocm_smi.cc:913: rsmi_status_t 
>> get_frequencies(amd::smi::DevInfoTypes, uint32_t, rsmi_frequencies_t*, 
>> uint32_t*): Assertion `f->frequency[i-1] <= f->frequency[i]' failed.
>> ) = 220
>> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
>> 0x7f531f9bc000
>> rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
>> rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
>> getpid()= 37861
>> gettid()= 37861
>> tgkill(37861, 37861, SIGABRT)   = 0
>> rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
>> --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=37861, si_uid=1000} 
>> ---
>> +++ killed by SIGABRT (core dumped) +++
>> Aborted (core dumped)
>> $cat /sys/class/drm/card0/device/pp_dpm_fclk
>> 0: 571Mhz
>> 1: 1274Mhz *
>> 2: 1221Mhz
>> $_
>>
>> Why is the mid frequency larger than the last?
>> Why does get_frequencies() insists that they be ordered when they're not? 
>> (Does the tool need fixing or the kernel?)
>>
>> The current patchset doesn't report 0, and doesn't report any current if 0 
>> would've been reported as current. But anything else is reported as it 
>> would've been reported before the patch. And I tested it with vanilla 
>> amd-staging-drm-next--same thing.
>>
> Seems to crash both ways.  I'd rather either:
> 1. Remove the * when the clock is outside of the min and max ranges
> or
> 2. Clamp the clock to the max or min if it's above or below.
>
> And then fix the tools accordingly.  Those seem like the choices of
> least surprise considering the interface is supposed to show the
> current and available DPM levels.

So, if we get a case such as the one above, we remove the whole line at 1:, not 
just the asterisk, right? (for option 1 above).
The rocm-smi lib would fail if they're out of order, so only removing the * 
char would still confuse the tool.

What does it mean when the current frequency (line 1:) is above the "max" (line 
2:) as shown in my output above?

Do we perhaps want to sort them and report current still, and swap lines 1 and 
2?

Or should we simply remove the ordering requirement in rocm-smi lib?

Regards,
Luben

>
> Alex
>
>
>> Regards,
>> Luben
>>
>>
>> On 2021-10-19 09:25, Russell, Kent wrote:
>>
>> [AMD Official Use Only]
>>
>>
>>
>> It was the rocm-smi -c flag. Maybe some work was done to make it more 
>> robust, that would be nice. But the -c flag is supposed to show the current 
>> frequency for each clock type. -g would do the same, but just for SCLK.
>>
>>
>>
>> Kent
>>
>>
>>
>> From: Tuikov, Luben 
>> Sent: Tuesday, October 19, 2021 12:27 AM
>> To: Russell, Kent ; Deucher, Alexander 
>> ; Quan, Evan ; Lazar, Lijo 
>> ; amd-gfx@lists.freedesktop.org
>> Cc: Kasiviswanathan, Harish 
>> Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency
>>
>>
>>
>> Kent,
>>
>> What is the command which fails?
>> I can try to duplicate it here.
>>
>> So far, things I've tried, I cannot make rocm-smi fail. Command arguments?
>>
>> Regards,
>> Luben
>>
>> On 2021-10-18 21:06, Russell, Kent wrote:
>>
>> [AMD Official Use Only]
>>
>>
>>
>> The * is required for the rocm-smi’s functionality for showing what the 
>> current clocks are. We had a bug before where the * was removed, then the 
>> SMI died fantastically. Work could be done to try to handle that type of 
>> situation, but the SMI has a “show current clocks” and uses the * to 
>> determine which one is active
>>
>>
>>
>> Kent
>>
>>
>>
>> From: amd-gfx  On Behalf Of Russell, 
>> Kent
>> Sent: Monday, October 18, 2021 9:05 PM
>> To: Tuikov, Luben ; Deucher, Alexander 
>> ; Quan, Evan ; Lazar, Lijo 
>> ; amd-gfx@lists.freedesktop.org
>> 

Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-26 Thread Lazar, Lijo




On 10/27/2021 3:30 AM, Luben Tuikov wrote:

On 2021-10-26 17:26, Alex Deucher wrote:

On Tue, Oct 19, 2021 at 9:54 AM Luben Tuikov  wrote:

It again fails with the same message!
But this time it is different!
Here's why:

openat(AT_FDCWD, "/sys/class/drm/card0/device/pp_dpm_fclk", O_RDONLY) = 3
read(3, "0: 571Mhz \n1: 1274Mhz *\n2: 1221M"..., 8191) = 36
read(3, "", 8191)   = 0
close(3)= 0
write(2, "python3: /home/ltuikov/proj/amd/"..., 220python3: 
/home/ltuikov/proj/amd/rocm_smi_lib/src/rocm_smi.cc:913: rsmi_status_t 
get_frequencies(amd::smi::DevInfoTypes, uint32_t, rsmi_frequencies_t*, uint32_t*): Assertion 
`f->frequency[i-1] <= f->frequency[i]' failed.
) = 220
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f531f9bc000
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid()= 37861
gettid()= 37861
tgkill(37861, 37861, SIGABRT)   = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=37861, si_uid=1000} ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)
$cat /sys/class/drm/card0/device/pp_dpm_fclk
0: 571Mhz
1: 1274Mhz *
2: 1221Mhz
$_

Why is the mid frequency larger than the last?
Why does get_frequencies() insists that they be ordered when they're not? (Does 
the tool need fixing or the kernel?)

The current patchset doesn't report 0, and doesn't report any current if 0 
would've been reported as current. But anything else is reported as it would've 
been reported before the patch. And I tested it with vanilla 
amd-staging-drm-next--same thing.


Seems to crash both ways.  I'd rather either:
1. Remove the * when the clock is outside of the min and max ranges
or
2. Clamp the clock to the max or min if it's above or below.

And then fix the tools accordingly.  Those seem like the choices of
least surprise considering the interface is supposed to show the
current and available DPM levels.


So, if we get a case such as the one above, we remove the whole line at 1:, not 
just the asterisk, right? (for option 1 above).
The rocm-smi lib would fail if they're out of order, so only removing the * 
char would still confuse the tool.

What does it mean when the current frequency (line 1:) is above the "max" (line 
2:) as shown in my output above?

Do we perhaps want to sort them and report current still, and swap lines 1 and 
2?

Or should we simply remove the ordering requirement in rocm-smi lib?

These nodes help to find the current state of ASIC. Clamping the values 
will just erase questions like these and possible improvements/fixes.


For ex: if the situation above is a case of OD (this is only example, 
don't know what is really the case above) that goes beyond regular DPM 
min/max levels, then we could add + as improvement.


IMO, the node should reflect the current state of ASIC and masking the 
values shouldn't be done. Other cases could be 0 or FW handshake 
failures where * itself will be missing.


Thanks,
Lijo


Regards,
Luben



Alex



Regards,
Luben


On 2021-10-19 09:25, Russell, Kent wrote:

[AMD Official Use Only]



It was the rocm-smi -c flag. Maybe some work was done to make it more robust, 
that would be nice. But the -c flag is supposed to show the current frequency 
for each clock type. -g would do the same, but just for SCLK.



Kent



From: Tuikov, Luben 
Sent: Tuesday, October 19, 2021 12:27 AM
To: Russell, Kent ; Deucher, Alexander ; 
Quan, Evan ; Lazar, Lijo ; 
amd-gfx@lists.freedesktop.org
Cc: Kasiviswanathan, Harish 
Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency



Kent,

What is the command which fails?
I can try to duplicate it here.

So far, things I've tried, I cannot make rocm-smi fail. Command arguments?

Regards,
Luben

On 2021-10-18 21:06, Russell, Kent wrote:

[AMD Official Use Only]



The * is required for the rocm-smi’s functionality for showing what the current 
clocks are. We had a bug before where the * was removed, then the SMI died 
fantastically. Work could be done to try to handle that type of situation, but 
the SMI has a “show current clocks” and uses the * to determine which one is 
active



Kent



From: amd-gfx  On Behalf Of Russell, Kent
Sent: Monday, October 18, 2021 9:05 PM
To: Tuikov, Luben ; Deucher, Alexander ; 
Quan, Evan ; Lazar, Lijo ; 
amd-gfx@lists.freedesktop.org
Cc: Kasiviswanathan, Harish 
Subject: RE: [PATCH 0/5] 0 MHz is not a valid current frequency



[AMD Official Use Only]



+Harish, rocm-smi falls under his purview now.



Kent



From: Tuikov, Luben 
Sent: Monday, October 18, 2021 4:30 PM
To: Deucher, Alexander ; Quan, Evan ; Lazar, 
Lijo ; amd-gfx@lists.freedesktop.org; Russell, Kent 

Subject: Re: [PATCH 0/5] 

Re: [PATCH 0/5] 0 MHz is not a valid current frequency

2021-10-27 Thread Alex Deucher
mping works as well.  E.g., while SMU reports 0 in some
cases, that could be due to the clock being off or the block being
powergated.  It's still in the lowest state.  If the clock were on, it
would be in the lowest state.  Same thing on the high end.  It's
possible there is a little slop in the clock calculations for
stability.  E.g., the clock may be a few Mhz above the max because SMU
determined that the current max frequency was not stable in
combination with some other clock.  It's still ostensibly in the
highest DPM state.

Alex


>
> Thanks,
> Lijo
>
> > Regards,
> > Luben
> >
> >>
> >> Alex
> >>
> >>
> >>> Regards,
> >>> Luben
> >>>
> >>>
> >>> On 2021-10-19 09:25, Russell, Kent wrote:
> >>>
> >>> [AMD Official Use Only]
> >>>
> >>>
> >>>
> >>> It was the rocm-smi -c flag. Maybe some work was done to make it more 
> >>> robust, that would be nice. But the -c flag is supposed to show the 
> >>> current frequency for each clock type. -g would do the same, but just for 
> >>> SCLK.
> >>>
> >>>
> >>>
> >>> Kent
> >>>
> >>>
> >>>
> >>> From: Tuikov, Luben 
> >>> Sent: Tuesday, October 19, 2021 12:27 AM
> >>> To: Russell, Kent ; Deucher, Alexander 
> >>> ; Quan, Evan ; Lazar, Lijo 
> >>> ; amd-gfx@lists.freedesktop.org
> >>> Cc: Kasiviswanathan, Harish 
> >>> Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency
> >>>
> >>>
> >>>
> >>> Kent,
> >>>
> >>> What is the command which fails?
> >>> I can try to duplicate it here.
> >>>
> >>> So far, things I've tried, I cannot make rocm-smi fail. Command arguments?
> >>>
> >>> Regards,
> >>> Luben
> >>>
> >>> On 2021-10-18 21:06, Russell, Kent wrote:
> >>>
> >>> [AMD Official Use Only]
> >>>
> >>>
> >>>
> >>> The * is required for the rocm-smi’s functionality for showing what the 
> >>> current clocks are. We had a bug before where the * was removed, then the 
> >>> SMI died fantastically. Work could be done to try to handle that type of 
> >>> situation, but the SMI has a “show current clocks” and uses the * to 
> >>> determine which one is active
> >>>
> >>>
> >>>
> >>> Kent
> >>>
> >>>
> >>>
> >>> From: amd-gfx  On Behalf Of 
> >>> Russell, Kent
> >>> Sent: Monday, October 18, 2021 9:05 PM
> >>> To: Tuikov, Luben ; Deucher, Alexander 
> >>> ; Quan, Evan ; Lazar, Lijo 
> >>> ; amd-gfx@lists.freedesktop.org
> >>> Cc: Kasiviswanathan, Harish 
> >>> Subject: RE: [PATCH 0/5] 0 MHz is not a valid current frequency
> >>>
> >>>
> >>>
> >>> [AMD Official Use Only]
> >>>
> >>>
> >>>
> >>> +Harish, rocm-smi falls under his purview now.
> >>>
> >>>
> >>>
> >>> Kent
> >>>
> >>>
> >>>
> >>> From: Tuikov, Luben 
> >>> Sent: Monday, October 18, 2021 4:30 PM
> >>> To: Deucher, Alexander ; Quan, Evan 
> >>> ; Lazar, Lijo ; 
> >>> amd-gfx@lists.freedesktop.org; Russell, Kent 
> >>> Subject: Re: [PATCH 0/5] 0 MHz is not a valid current frequency
> >>>
> >>>
> >>>
> >>> I think Kent is already seen these patches as he did comment on 1/5 patch.
> >>>
> >>> The v3 version of the patch, posted last week, removes the asterisk to 
> >>> report the lowest frequency as the current frequency, when the current 
> >>> frequency is 0, i.e. when the block is in low power state. Does the tool 
> >>> rely on the asterisk? If this information is necessary could it not use 
> >>> amdgpu_pm_info?
> >>>
> >>> Regards,
> >>> Luben
> >>>
> >>> On 2021-10-18 16:19, Deucher, Alexander wrote:
> >>>
> >>> [Public]
> >>>
> >>>
> >>>
> >>> We the current behavior (0 for clock) already crashes the tool, so I 
> >>> don't think we can really make things worse.
> >>>
> >>>
> >>>
&