On 5/28/2018 1:09 AM, George Cherian wrote:
> Hi Prashanth,
>
> On 05/26/2018 02:30 AM, Prakash, Prashanth wrote:
>>
>> On 5/25/2018 12:27 AM, George Cherian wrote:
>>> Hi Prashanth,
>>>
>>> On 05/25/2018 12:55 AM, Prakash, Prashanth wrote:
>>>> Hi George,
>>>>
>>>> On 5/22/2018 5:42 AM, George Cherian wrote:
>>>>> Per Section 8.4.7.1.3 of ACPI 6.2, The platform provides performance
>>>>> feedback via set of performance counters. To determine the actual
>>>>> performance level delivered over time, OSPM may read a set of
>>>>> performance counters from the Reference Performance Counter Register
>>>>> and the Delivered Performance Counter Register.
>>>>>
>>>>> OSPM calculates the delivered performance over a given time period by
>>>>> taking a beginning and ending snapshot of both the reference and
>>>>> delivered performance counters, and calculating:
>>>>>
>>>>> delivered_perf = reference_perf X (delta of delivered_perf counter /
>>>>> delta of reference_perf counter).
>>>>>
>>>>> Implement the above and hook this to the cpufreq->get method.
>>>>>
>>>>> Signed-off-by: George Cherian <[email protected]>
>>>>> ---
>>>>> drivers/cpufreq/cppc_cpufreq.c | 44
>>>>> ++++++++++++++++++++++++++++++++++++++++++
>>>>> 1 file changed, 44 insertions(+)
>>>>>
>>>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c
>>>>> b/drivers/cpufreq/cppc_cpufreq.c
>>>>> index b15115a..a046915 100644
>>>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>>>> @@ -240,10 +240,54 @@ static int cppc_cpufreq_cpu_init(struct
>>>>> cpufreq_policy *policy)
>>>>> return ret;
>>>>> }
>>>>> +static int cppc_get_rate_from_fbctrs(struct cppc_perf_fb_ctrs
>>>>> fb_ctrs_t0,
>>>>> + struct cppc_perf_fb_ctrs fb_ctrs_t1)
>>>>> +{
>>>>> + u64 delta_reference, delta_delivered;
>>>>> + u64 reference_perf, ratio;
>>>>> +
>>>>> + reference_perf = fb_ctrs_t0.reference_perf;
>>>>> + if (fb_ctrs_t1.reference > fb_ctrs_t0.reference)
>>>>> + delta_reference = fb_ctrs_t1.reference - fb_ctrs_t0.reference;
>>>>> + else /* Counters would have wrapped-around */
>>>>> + delta_reference = ((u64)(~((u64)0)) - fb_ctrs_t0.reference) +
>>>>> + fb_ctrs_t1.reference;
>>>>> +
>>>>> + if (fb_ctrs_t1.delivered > fb_ctrs_t0.delivered)
>>>>> + delta_delivered = fb_ctrs_t1.delivered - fb_ctrs_t0.delivered;
>>>>> + else /* Counters would have wrapped-around */
>>>>> + delta_delivered = ((u64)(~((u64)0)) - fb_ctrs_t0.delivered) +
>>>>> + fb_ctrs_t1.delivered;
>>>> We need to check that the wraparound time is long enough to make sure that
>>>> the counters cannot wrap around more than once. We can register a get()
>>>> api
>>>> only after checking that wraparound time value is reasonably high.
>>>>
>>>> I am not aware of any platforms where wraparound time is soo short, but
>>>> wouldn't hurt to check once during init.
>>> By design the wraparound time is a 64 bit counter, for that matter even
>>> all the feedback counters too are 64 bit counters. I don't see any
>>> chance in which the counters can wraparound twice in back to back reads.
>>> The only situation is in which system itself is running at a really high
>>> frequency. Even in that case today's spec is not sufficient to support the
>>> same.
>>
>> The spec doesn't say these have to be 64bit registers. The wraparound
>> counter register is in spec to communicate the worst case(shortest)
>> counter rollover time.
>
> Spec says these are 32 or 64 bit registers. Spec also defines counter
> wraparound time in seconds. The minimum value possible is 1 as zero means the
> counters are never assumed to wrap around. Even in platforms with value set
> as 1 (1 sec) I dont really see a situation in which
> the counter can wraparound twice if we are putting a delay of 2usec
> between sampling.
ok.
>
>>
>> As as mentioned before this is just a defensive check to make sure that
>> the platform has not set it to some very low number (which is allowed
>> by the spec).
> It might be unnecessary to have a check like this.
>>
>>>
>>>>> +
>>>>> + if (delta_reference) /* Check to avoid divide-by zero */
>>>>> + ratio = (delta_delivered * 1000) / delta_reference;
>>>> Why not just return the computed value here instead of *1000 and later
>>>> /1000?
>>>> return (ref_per * delta_del) / delta_ref;
>>> Yes.
>>>>> + else
>>>>> + return -EINVAL;
>>>> Instead of EINVAL, i think we should return current frequency.
>>>>
>>> Sorry, I didn't get you, How do you calculate the current frequency?
>>> Did you mean reference performance?
>> I mean the performance that OSPM/Linux had requested earlier.
>> i.e the desired_perf
> Okay, I will make necessary changes for this in v2.
>
>>>
>>>> The counters can pause if CPUs are in idle state during our sampling
>>>> interval, so
>>>> If the counters did not progress, it is reasonable to assume the delivered
>>>> perf was
>>>> equal to desired perf.
>>> No, that is wrong. Here the check is for reference performance delta.
>>> This counter can never pause. In case of cpuidle only the delivered
>>> counters could pause. Delivered counters will pause only if the particular
>>> core enters power down mode, Otherwise we would be still clocking the core
>>> and we should be getting a delta across 2 sampling periods. In case if the
>>> reference counter is paused which by design is not correct then there is no
>>> point in returning reference performance numbers. That too is wrong. In
>>> case the low level FW is not updating the
>>> counters properly then it should be evident till Linux, instead of
>>> returning a bogus frequency.
>>
>> Again you are describing how it works on a specific platform and not
>> how it is described in spec. Section 8.4.7.1.3.1.1 of ACPI 6.2 states
>> "The Reference Performance Counter Register counts at a fixed rate
>> any time the processor is active."
>> > Implies the counters *may* pause in idle state -I can imagine an
>> implementation where you can keep this counter running and
>> account for it via delivered counter, but we cannot make any
>> assumptions outside of what the spec describes.
>>
>>>>
>>>> Even if platform wanted to limit, since the CPUs were asleep(idle) we
>>>> could not have
>>>> observed lower performance, so we will not throw off any logic that could
>>>> be driven
>>>> using the returned value.
>>>>> +
>>>>> + return (reference_perf * ratio) / 1000;
>>>> This should be converted to KHz as cpufreq is not aware of CPPC abstract
>>>> scale
>>> In our platform all performance registers are implemented in KHz. Because
>>> of which we never had an issue with conversion. I am not
>>> aware whether ACPI mandates to use any particular unit. How is that
>>> implemented in your platform? Just to avoid any extra conversion don't
>>> you feel it is better to always report in KHz from firmware.
>> Again think of spec not a specific platform :)
>> - The CPPC spec works on abstract scale and cpufreq works in KHz.
>> - The above computed value is in abstract scale
>> - The abstarct scale may be in KHz on your platform, but we cannot assume the
>> same about all the platforms
> For now can I assume it to be in KHz only?
No, it will break platforms where abstract scale is not in KHz.
> I am not sure how to convert the abstract scale to Khz.
> Can you please give me some pointers on the same?
Take a look at cppc_cpufreq_perf_to_khz and cppc_cpufreq_khz_to_perf
in the same file (cppc_cpufreq.c). We use this in almost every function
registered with core cpufreq.
||
> In spec there is currently no interface which tells what is the abstract
> scale!!
CPPC v3 adds some additional hooks for this. On CPPC v2, we try to use
few DMI table entries to get the ratio between abstract scale and KHz.
>>>
>>>>> +}
>>>>> +
>>>>> +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
>>>>> +{
>>>>> + struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
>>>>> + int ret;
>>>>> +
>>>>> + ret = cppc_get_perf_ctrs(cpunum, &fb_ctrs_t0);
>>>>> + if (ret)
>>>>> + return ret;
>>>>> +
>>>>> + ret = cppc_get_perf_ctrs(cpunum, &fb_ctrs_t1);
>>>>> + if (ret)
>>>>> + return ret;
>>>>> +
>>>>> + return cppc_get_rate_from_fbctrs(fb_ctrs_t0, fb_ctrs_t1);
>>>>> +}
>>>> We need to make sure that we get a reasonably sample so make sure the
>>>> reported
>>>> performance is accurate.
>>>> The counters can capture short transient throttling/limiting, so by
>>>> sampling a really
>>>> short duration of time we could return quite inaccurate measure of
>>>> performance.
>>>>
>>> I would say it as a momentary thing only when the frequency is being ramped
>>> up/down.
>> This exact behavior would depend on how different limiting functions are
>> implemented.
>> So this would vary from one platform to another.
>>>
>>>> We need to include some reasonable delay between the two calls. What is
>>>> reasonable
>>>> is debatable - so this can be few(2-10) microseconds defined as default.
>>>> If the same value
>>>> doesn't work for all the platforms, we might need to add a platform
>>>> specific value.
>>>>
>>> cppc_get_perf_ctrs itself is a slow call, we have to format the CPC packet
>>> and ring a doorbell and then the response to be read from the shared
>>> registers. My initial implementation had a delay but in testing,
>>> I found that it was unnecessary to have such a delay. Can you please
>>> let me know whether it works without delay in your platform?
>>>
>>> Or else let me know whether udelay(10) is sufficient in between the
>>> calls.
>> Feedback counters need not be in PCC .
>> 2us should be sufficient.
> Yes I will add this to v2.
>>>>> +
>>>>> static struct cpufreq_driver cppc_cpufreq_driver = {
>>>>> .flags = CPUFREQ_CONST_LOOPS,
>>>>> .verify = cppc_verify_policy,
>>>>> .target = cppc_cpufreq_set_target,
>>>>> + .get = cppc_cpufreq_get_rate,
>>>>> .init = cppc_cpufreq_cpu_init,
>>>>> .stop_cpu = cppc_cpufreq_stop_cpu,
>>>>> .name = "cppc_cpufreq",
>>>>
>>