On Tue, May 5, 2026 at 1:46 PM Maíra Canal <[email protected]> wrote: > > Hi Rob,
Thanks for the review Maíra! > On 06/03/26 13:36, Rob Herring (Arm) wrote: > > The Arm Ethos-U NPUs have a PMU with performance counters. The PMU h/w > > supports up to 4 (U65) or 8 (U85) counters which can be programmed for > > different events. There is also a dedicated cycle counter. > > > > The ABI and implementation are copied from the V3D driver. The main > > I'm planning to send a series fixing some perfmon locking issues in the > V3D driver later this week. But, to give you a brief summary of the main > issues: > > 1. The active perfmon is not protected against races (run_job() vs. > IOCTLs). > > 2. ethosu_switch_perfmon() only starts/stops the perfmon in run_job(). > This means that if nothing is further queued up, a perfmon will never > actually be stopped. For ethosu, other than idle and cycle counts, nothing else counts when a job is done. Idle is just the time from enabling the counter to start a job and then from the job stopping to reading the counters. IOW, the idle count while a job is running is 0 and idle+active=cycles. > More comments inline: > > > difference in the ABI is there is no query API for the the event list. > > The events differ between the U65 and U85, so the events lists are > > maintained in userspace along with other differences between the U65 and > > U85. > > > > The cycle counter is always enabled when the PMU is enabled. When the > > user requests N events, reading the counters will return the N events > > plus the cycle counter. > > > > Signed-off-by: Rob Herring (Arm) <[email protected]> > > --- > > MR for mesa with initial support is here: > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40269 > > > > drivers/accel/ethosu/Makefile | 2 +- > > drivers/accel/ethosu/ethosu_device.h | 21 ++ > > drivers/accel/ethosu/ethosu_drv.c | 17 +- > > drivers/accel/ethosu/ethosu_drv.h | 62 ++++++ > > drivers/accel/ethosu/ethosu_job.c | 35 +++- > > drivers/accel/ethosu/ethosu_job.h | 2 + > > drivers/accel/ethosu/ethosu_perfmon.c | 282 ++++++++++++++++++++++++++ > > include/uapi/drm/ethosu_accel.h | 59 +++++- > > 8 files changed, 469 insertions(+), 11 deletions(-) > > create mode 100644 drivers/accel/ethosu/ethosu_perfmon.c > > > > [...] > > > diff --git a/drivers/accel/ethosu/ethosu_drv.h > > b/drivers/accel/ethosu/ethosu_drv.h > > index 9e21dfe94184..57eee79e4b83 100644 > > --- a/drivers/accel/ethosu/ethosu_drv.h > > +++ b/drivers/accel/ethosu/ethosu_drv.h > > @@ -3,13 +3,75 @@ > > #ifndef __ETHOSU_DRV_H__ > > #define __ETHOSU_DRV_H__ > > > > +#include <linux/idr.h> > > +#include <linux/mutex.h> > > #include <drm/gpu_scheduler.h> > > > > struct ethosu_device; > > +struct drm_device; > > +struct drm_file; > > > > struct ethosu_file_priv { > > struct ethosu_device *edev; > > struct drm_sched_entity sched_entity; > > + > > + struct { > > + struct idr idr; > > You may want to consider moving to XArray as we did in V3D [1]. Okay. I'll have to study that some. I still don't have my head around XArray... > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a5b0d095bcdb219348ed8ae1c97ee99fc4913b8 > > > + struct mutex lock; > > + } perfmon; > > +}; > > + > > +/* Performance monitor object. The perform lifetime is controlled by > > userspace > > + * using perfmon related ioctls. A perfmon can be attached to a submit_cl > > + * request, and when this is the case, HW perf counters will be activated > > just > > + * before the submit_cl is submitted to the GPU and disabled when the job > > is > > + * done. This way, only events related to a specific job will be counted. > > + */ > > +struct ethosu_perfmon { > > + /* Tracks the number of users of the perfmon, when this counter > > reaches > > + * zero the perfmon is destroyed. > > + */ > > + refcount_t refcnt; > > + > > + /* Protects perfmon stop, as it can be invoked from multiple places. > > */ > > + struct mutex lock; > > I'm not familiar with the Ethos hardware, but *if* it has only a single > performance monitor active in HW at any given moment (like v3d), this > lock doesn't enforce this restriction. > > This lock only serializes start/stop of *one* perfmon object against > itself, but the invariant (at least on v3d) that needs protection is > device-wide: there is exactly one active perfmon at any moment in HW. You mean V3D has 1 perfmon shared among its 3? h/w queues? We only have 1 queue. We only touch the perfmon when we touch the h/w to run/end a job, so the same serialization for the job accesses should cover the PMU accesses. Rob

