> On Oct 23, 2020, at 6:43 AM, Robin Murphy <robin.mur...@arm.com> wrote: > > On 2020-10-22 22:46, Tuan Phan wrote: > [...] >>>> +#define _ATTR_CFG_GET_FLD(attr, cfg, lo, hi) \ >>>> + ((((attr)->cfg) >> lo) & GENMASK(hi - lo, 0)) >>> >>> As per the buildbot report, GENMASK_ULL() would be appropriate when the >>> other side is a u64 (although either way this does look a lot like >>> reinventing FIELD_GET()...) >> I will add (COMPILE_TEST && 64BIT) to Kconfig so it should fix the buildbot >> report. > > Huh? The left-hand side of the "&" expression will always be a u64 here, > which is unsigned long long. Regardless of whether an unsigned long on the > right-hand side happens to be the same size, you have a semantic type > mismatch, which is trivial to put right. I can't comprehend why introducing a > fake build dependency to hide this would seem like a better idea than making > a tiny change to make the code 100% correct and robust with zero practical > impact :/ > > Sure, you only copied this from the SPE driver; that doesn't mean it was ever > correct, simply that the mismatch was hidden since that driver *is* tightly > coupled to one particular CPU ISA. Got it. Actually after seeing your CMN driver which has (COMPILE_TEST && 64BIT), I thought the driver should be only tested under 64BIT platform. Any reason why CMN need 64BIT with COMPILE_TEST? > > [...] >>>> +static irqreturn_t dmc620_pmu_handle_irq(int irq_num, void *data) >>>> +{ >>>> + struct dmc620_pmu_irq *irq = data; >>>> + struct dmc620_pmu *dmc620_pmu; >>>> + irqreturn_t ret = IRQ_NONE; >>>> + >>>> + rcu_read_lock(); >>>> + list_for_each_entry_rcu(dmc620_pmu, &irq->pmus_node, pmus_node) { >>>> + unsigned long status; >>>> + struct perf_event *event; >>>> + unsigned int idx; >>>> + >>>> + /* >>>> + * HW doesn't provide a control to atomically disable all >>>> counters. >>>> + * To prevent race condition, disable all events before >>>> continuing >>>> + */ >>> >>> I'm still doubtful that this doesn't introduce more inaccuracy overall than >>> whatever it's trying to avoid... :/ >> It think it does. By disabling all counters, you make sure overflow status >> not change at the same time you are clearing >> it(by writing zero) after reading all counters. > > Urgh, *now* I get what the race is - we don't have a proper write-1-to-clear > interrupt status register, so however much care we take in writing back to > the overflow register there's always *some* risk of wiping out a new event > when writing back, unless we ensure that no new overflows can occur *before* > reading the status. What a horrible piece of hardware design... :( > > Perhaps it's worth expanding the comment a little more, since apparently it's > not super-obvious. I will expand the common more to explain the race condition. > > [...] >>>> + /* >>>> + * We must NOT create groups containing mixed PMUs, although software >>>> + * events are acceptable. >>>> + */ >>>> + if (event->group_leader->pmu != event->pmu && >>>> + !is_software_event(event->group_leader)) >>>> + return -EINVAL; >>>> + >>>> + for_each_sibling_event(sibling, event->group_leader) { >>>> + if (sibling->pmu != event->pmu && >>>> + !is_software_event(sibling)) >>>> + return -EINVAL; >>>> + } >>> >>> As before, if you can't start, stop, or read multiple counters atomically, >>> you can't really support groups of events for this PMU either. It's >>> impossible to measure accurate ratios with a variable amount of skew >>> between individual counters. >> Can you elaborate more? The only issue I know is we can’t stop all counters >> of same PMU atomically in IRQ handler to prevent race condition. But it can >> be fixed by manually disable each counter. Other than that, every counters >> are working independently. > > The point of groups is to be able to count two or more events for the exact > same time period, in order to measure ratios between them accurately. ->add, > ->del, ->read, etc. are still called one at a time for each event in the > group, but those calls are made as part of a transaction, which for most > drivers is achieved by perf core calling ->pmu_disable and ->pmu_enable > around the other calls. Since this driver doesn't have enable/disable > functionality, the individual events will count for different lengths of time > depending on what order those calls are made in (which is not necessarily > constant), and how long each one takes. Thus you'll end up with an > indeterminate amount of error between what each count represents, and the > group is not really any more accurate than if the events were simply > scheduled independently, which is not how it's supposed to work. > > Come to think of it, you're also not validating that groups are even > schedulable - try something like: > > perf stat -e > '{arm_dmc620_10008c000/clk_cycle_count/,arm_dmc620_10008c000/clk_request/,arm_dmc620_10008c000/clk_upload_stall/}' > sleep 5 > > and observe perf core being very confused and unhappy when ->add starts > failing for a group that ->event_init said was OK, since 3 events won't > actually fit into the 2 available counters. > > As I said before, I think you probably would be able to make groups work with > some careful hooking up of snapshot functionality to ->start_txn and > ->commit_txn, but to start with it'll be an awful lot simpler to just be > honest and reject them. Got it. Thanks for educating me. I will allow only one HW event then. > > [...] >>>> + name = devm_kasprintf(&pdev->dev, GFP_KERNEL, >>>> + "%s_%llx", DMC620_PMUNAME, >>>> + (res->start) >> DMC620_PA_SHIFT); >>> >>> res->start doesn't need parentheses, however I guess it might need casting >>> to u64 to solve the build warning (I'm not sure there's any nicer way, >>> unfortunately). >> I will remove those parentheses, we don’t need u64 as build warning only >> applies when it runs compiling test with 32bit. > > As above, deliberately hacking the build for the sake of not fixing clearly > dodgy code is crazy. The only correct format specifier for an expression of > type phys_addr_t/resource_size_t is "%pa"; if you want to use a different > format then explicitly converting the argument to a type appropriate for that > format (either via a simple cast or an intermediate variable) is indisputably > correct, regardless of whether you might happen to get away with an implicit > conversion sometimes. > > The whole point of maximising COMPILE_TEST coverage is to improve code > quality in order to avoid this exact situation, wherein someone copies a > pattern from an existing driver only to discover that it's not actually as > robust as it should be. Got it. Will fix it Thanks, Tuan. > > Robin.