Re: [PATCH v4] driver/perf: Add PMU driver for the ARM DMC-620 memory controller

Tuan Phan Fri, 23 Oct 2020 11:24:08 -0700

> On Oct 23, 2020, at 6:43 AM, Robin Murphy <robin.mur...@arm.com> wrote:
> 
> On 2020-10-22 22:46, Tuan Phan wrote:
> [...]
>>>> +#define _ATTR_CFG_GET_FLD(attr, cfg, lo, hi)                      \
>>>> +  ((((attr)->cfg) >> lo) & GENMASK(hi - lo, 0))
>>> 
>>> As per the buildbot report, GENMASK_ULL() would be appropriate when the 
>>> other side is a u64 (although either way this does look a lot like 
>>> reinventing FIELD_GET()...)
>> I will add (COMPILE_TEST && 64BIT) to Kconfig so it should fix the buildbot 
>> report.
> 
> Huh? The left-hand side of the "&" expression will always be a u64 here, 
> which is unsigned long long. Regardless of whether an unsigned long on the 
> right-hand side happens to be the same size, you have a semantic type 
> mismatch, which is trivial to put right. I can't comprehend why introducing a 
> fake build dependency to hide this would seem like a better idea than making 
> a tiny change to make the code 100% correct and robust with zero practical 
> impact :/
> 
> Sure, you only copied this from the SPE driver; that doesn't mean it was ever 
> correct, simply that the mismatch was hidden since that driver *is* tightly 
> coupled to one particular CPU ISA.

Got it. Actually after seeing your CMN driver which has (COMPILE_TEST && 
64BIT), I thought the driver should be only tested under 64BIT platform. Any 
reason why CMN need 64BIT with COMPILE_TEST?

> 
> [...]
>>>> +static irqreturn_t dmc620_pmu_handle_irq(int irq_num, void *data)
>>>> +{
>>>> +  struct dmc620_pmu_irq *irq = data;
>>>> +  struct dmc620_pmu *dmc620_pmu;
>>>> +  irqreturn_t ret = IRQ_NONE;
>>>> +
>>>> +  rcu_read_lock();
>>>> +  list_for_each_entry_rcu(dmc620_pmu, &irq->pmus_node, pmus_node) {
>>>> +          unsigned long status;
>>>> +          struct perf_event *event;
>>>> +          unsigned int idx;
>>>> +
>>>> +          /*
>>>> +           * HW doesn't provide a control to atomically disable all 
>>>> counters.
>>>> +           * To prevent race condition, disable all events before 
>>>> continuing
>>>> +           */
>>> 
>>> I'm still doubtful that this doesn't introduce more inaccuracy overall than 
>>> whatever it's trying to avoid... :/
>> It think it does. By disabling all counters, you make sure overflow status 
>> not change at the same time you are clearing
>> it(by writing zero) after reading all counters.
> 
> Urgh, *now* I get what the race is - we don't have a proper write-1-to-clear 
> interrupt status register, so however much care we take in writing back to 
> the overflow register there's always *some* risk of wiping out a new event 
> when writing back, unless we ensure that no new overflows can occur *before* 
> reading the status. What a horrible piece of hardware design... :(
> 
> Perhaps it's worth expanding the comment a little more, since apparently it's 
> not super-obvious.

I will expand the common more to explain the race condition.
> 
> [...]
>>>> +  /*
>>>> +   * We must NOT create groups containing mixed PMUs, although software
>>>> +   * events are acceptable.
>>>> +   */
>>>> +  if (event->group_leader->pmu != event->pmu &&
>>>> +                  !is_software_event(event->group_leader))
>>>> +          return -EINVAL;
>>>> +
>>>> +  for_each_sibling_event(sibling, event->group_leader) {
>>>> +          if (sibling->pmu != event->pmu &&
>>>> +                          !is_software_event(sibling))
>>>> +                  return -EINVAL;
>>>> +  }
>>> 
>>> As before, if you can't start, stop, or read multiple counters atomically, 
>>> you can't really support groups of events for this PMU either. It's 
>>> impossible to measure accurate ratios with a variable amount of skew 
>>> between individual counters.
>> Can you elaborate more? The only issue I know is we can’t stop all counters 
>> of same PMU atomically in IRQ handler to prevent race condition.  But it can 
>> be fixed by manually disable each counter. Other than that, every counters 
>> are working independently.
> 
> The point of groups is to be able to count two or more events for the exact 
> same time period, in order to measure ratios between them accurately. ->add, 
> ->del, ->read, etc. are still called one at a time for each event in the 
> group, but those calls are made as part of a transaction, which for most 
> drivers is achieved by perf core calling ->pmu_disable and ->pmu_enable 
> around the other calls. Since this driver doesn't have enable/disable 
> functionality, the individual events will count for different lengths of time 
> depending on what order those calls are made in (which is not necessarily 
> constant), and how long each one takes. Thus you'll end up with an 
> indeterminate amount of error between what each count represents, and the 
> group is not really any more accurate than if the events were simply 
> scheduled independently, which is not how it's supposed to work.
> 
> Come to think of it, you're also not validating that groups are even 
> schedulable - try something like:
> 
>  perf stat -e 
> '{arm_dmc620_10008c000/clk_cycle_count/,arm_dmc620_10008c000/clk_request/,arm_dmc620_10008c000/clk_upload_stall/}'
>  sleep 5
> 
> and observe perf core being very confused and unhappy when ->add starts 
> failing for a group that ->event_init said was OK, since 3 events won't 
> actually fit into the 2 available counters.
> 
> As I said before, I think you probably would be able to make groups work with 
> some careful hooking up of snapshot functionality to ->start_txn and 
> ->commit_txn, but to start with it'll be an awful lot simpler to just be 
> honest and reject them.

Got it. Thanks for educating me. I will allow only one HW event then.
> 
> [...]
>>>> +  name = devm_kasprintf(&pdev->dev, GFP_KERNEL,
>>>> +                            "%s_%llx", DMC620_PMUNAME,
>>>> +                            (res->start) >> DMC620_PA_SHIFT);
>>> 
>>> res->start doesn't need parentheses, however I guess it might need casting 
>>> to u64 to solve the build warning (I'm not sure there's any nicer way, 
>>> unfortunately).
>> I will remove those parentheses, we don’t need u64 as build warning only 
>> applies when it runs compiling test with 32bit.
> 
> As above, deliberately hacking the build for the sake of not fixing clearly 
> dodgy code is crazy. The only correct format specifier for an expression of 
> type phys_addr_t/resource_size_t is "%pa"; if you want to use a different 
> format then explicitly converting the argument to a type appropriate for that 
> format (either via a simple cast or an intermediate variable) is indisputably 
> correct, regardless of whether you might happen to get away with an implicit 
> conversion sometimes.
> 
> The whole point of maximising COMPILE_TEST coverage is to improve code 
> quality in order to avoid this exact situation, wherein someone copies a 
> pattern from an existing driver only to discover that it's not actually as 
> robust as it should be.

Got it. Will fix it

Thanks,
Tuan.

> 
> Robin.
Re: [PATCH v4] driver/perf: Add PMU driver for the ARM DMC-620 memory controller

Reply via email to