Re: [PATCH v4] driver/perf: Add PMU driver for the ARM DMC-620 memory controller

Robin Murphy Fri, 23 Oct 2020 06:43:29 -0700

On 2020-10-22 22:46, Tuan Phan wrote:
[...]

+#define _ATTR_CFG_GET_FLD(attr, cfg, lo, hi)                   \
+       ((((attr)->cfg) >> lo) & GENMASK(hi - lo, 0))


As per the buildbot report, GENMASK_ULL() would be appropriate when the other 
side is a u64 (although either way this does look a lot like reinventing 
FIELD_GET()...)


I will add (COMPILE_TEST && 64BIT) to Kconfig so it should fix the buildbot 
report.

Huh? The left-hand side of the "&" expression will always be a u64 here,which is unsigned long long. Regardless of whether an unsigned long onthe right-hand side happens to be the same size, you have a semantictype mismatch, which is trivial to put right. I can't comprehend whyintroducing a fake build dependency to hide this would seem like abetter idea than making a tiny change to make the code 100% correct androbust with zero practical impact :/

Sure, you only copied this from the SPE driver; that doesn't mean it wasever correct, simply that the mismatch was hidden since that driver *is*tightly coupled to one particular CPU ISA.


[...]

+static irqreturn_t dmc620_pmu_handle_irq(int irq_num, void *data)
+{
+       struct dmc620_pmu_irq *irq = data;
+       struct dmc620_pmu *dmc620_pmu;
+       irqreturn_t ret = IRQ_NONE;
+
+       rcu_read_lock();
+       list_for_each_entry_rcu(dmc620_pmu, &irq->pmus_node, pmus_node) {
+               unsigned long status;
+               struct perf_event *event;
+               unsigned int idx;
+
+               /*
+                * HW doesn't provide a control to atomically disable all 
counters.
+                * To prevent race condition, disable all events before 
continuing
+                */


I'm still doubtful that this doesn't introduce more inaccuracy overall than 
whatever it's trying to avoid... :/


It think it does. By disabling all counters, you make sure overflow status not 
change at the same time you are clearing
it(by writing zero) after reading all counters.

Urgh, *now* I get what the race is - we don't have a properwrite-1-to-clear interrupt status register, so however much care we takein writing back to the overflow register there's always *some* risk ofwiping out a new event when writing back, unless we ensure that no newoverflows can occur *before* reading the status. What a horrible pieceof hardware design... :(

Perhaps it's worth expanding the comment a little more, since apparentlyit's not super-obvious.


[...]

+       /*
+        * We must NOT create groups containing mixed PMUs, although software
+        * events are acceptable.
+        */
+       if (event->group_leader->pmu != event->pmu &&
+                       !is_software_event(event->group_leader))
+               return -EINVAL;
+
+       for_each_sibling_event(sibling, event->group_leader) {
+               if (sibling->pmu != event->pmu &&
+                               !is_software_event(sibling))
+                       return -EINVAL;
+       }


As before, if you can't start, stop, or read multiple counters atomically, you 
can't really support groups of events for this PMU either. It's impossible to 
measure accurate ratios with a variable amount of skew between individual 
counters.


Can you elaborate more? The only issue I know is we can’t stop all counters of 
same PMU atomically in IRQ handler to prevent race condition.  But it can be 
fixed by manually disable each counter. Other than that, every counters are 
working independently.

The point of groups is to be able to count two or more events for theexact same time period, in order to measure ratios between themaccurately. ->add, ->del, ->read, etc. are still called one at a timefor each event in the group, but those calls are made as part of atransaction, which for most drivers is achieved by perf core calling->pmu_disable and ->pmu_enable around the other calls. Since this driverdoesn't have enable/disable functionality, the individual events willcount for different lengths of time depending on what order those callsare made in (which is not necessarily constant), and how long each onetakes. Thus you'll end up with an indeterminate amount of error betweenwhat each count represents, and the group is not really any moreaccurate than if the events were simply scheduled independently, whichis not how it's supposed to work.

Come to think of it, you're also not validating that groups are evenschedulable - try something like:

perf stat -e'{arm_dmc620_10008c000/clk_cycle_count/,arm_dmc620_10008c000/clk_request/,arm_dmc620_10008c000/clk_upload_stall/}'sleep 5

and observe perf core being very confused and unhappy when ->add startsfailing for a group that ->event_init said was OK, since 3 events won'tactually fit into the 2 available counters.

As I said before, I think you probably would be able to make groups workwith some careful hooking up of snapshot functionality to ->start_txnand ->commit_txn, but to start with it'll be an awful lot simpler tojust be honest and reject them.


[...]

+       name = devm_kasprintf(&pdev->dev, GFP_KERNEL,
+                                 "%s_%llx", DMC620_PMUNAME,
+                                 (res->start) >> DMC620_PA_SHIFT);


res->start doesn't need parentheses, however I guess it might need casting to 
u64 to solve the build warning (I'm not sure there's any nicer way, unfortunately).


I will remove those parentheses, we don’t need u64 as build warning only 
applies when it runs compiling test with 32bit.

As above, deliberately hacking the build for the sake of not fixingclearly dodgy code is crazy. The only correct format specifier for anexpression of type phys_addr_t/resource_size_t is "%pa"; if you want touse a different format then explicitly converting the argument to a typeappropriate for that format (either via a simple cast or an intermediatevariable) is indisputably correct, regardless of whether you mighthappen to get away with an implicit conversion sometimes.

The whole point of maximising COMPILE_TEST coverage is to improve codequality in order to avoid this exact situation, wherein someone copies apattern from an existing driver only to discover that it's not actuallyas robust as it should be.


Robin.

Re: [PATCH v4] driver/perf: Add PMU driver for the ARM DMC-620 memory controller

Reply via email to