Hi Rob,
On 12/05/26 11:45, Rob Herring wrote:
On Tue, May 5, 2026 at 1:46 PM Maíra Canal <[email protected]> wrote:
Hi Rob,
Thanks for the review Maíra!
On 06/03/26 13:36, Rob Herring (Arm) wrote:
The Arm Ethos-U NPUs have a PMU with performance counters. The PMU h/w
supports up to 4 (U65) or 8 (U85) counters which can be programmed for
different events. There is also a dedicated cycle counter.
The ABI and implementation are copied from the V3D driver. The main
I'm planning to send a series fixing some perfmon locking issues in the
V3D driver later this week. But, to give you a brief summary of the main
issues:
1. The active perfmon is not protected against races (run_job() vs.
IOCTLs).
2. ethosu_switch_perfmon() only starts/stops the perfmon in run_job().
This means that if nothing is further queued up, a perfmon will never
actually be stopped.
For ethosu, other than idle and cycle counts, nothing else counts when
a job is done. Idle is just the time from enabling the counter to
start a job and then from the job stopping to reading the counters.
IOW, the idle count while a job is running is 0 and
idle+active=cycles.
More comments inline:
difference in the ABI is there is no query API for the the event list.
The events differ between the U65 and U85, so the events lists are
maintained in userspace along with other differences between the U65 and
U85.
The cycle counter is always enabled when the PMU is enabled. When the
user requests N events, reading the counters will return the N events
plus the cycle counter.
Signed-off-by: Rob Herring (Arm) <[email protected]>
---
MR for mesa with initial support is here:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40269
drivers/accel/ethosu/Makefile | 2 +-
drivers/accel/ethosu/ethosu_device.h | 21 ++
drivers/accel/ethosu/ethosu_drv.c | 17 +-
drivers/accel/ethosu/ethosu_drv.h | 62 ++++++
drivers/accel/ethosu/ethosu_job.c | 35 +++-
drivers/accel/ethosu/ethosu_job.h | 2 +
drivers/accel/ethosu/ethosu_perfmon.c | 282 ++++++++++++++++++++++++++
include/uapi/drm/ethosu_accel.h | 59 +++++-
8 files changed, 469 insertions(+), 11 deletions(-)
create mode 100644 drivers/accel/ethosu/ethosu_perfmon.c
[...]
diff --git a/drivers/accel/ethosu/ethosu_drv.h
b/drivers/accel/ethosu/ethosu_drv.h
index 9e21dfe94184..57eee79e4b83 100644
--- a/drivers/accel/ethosu/ethosu_drv.h
+++ b/drivers/accel/ethosu/ethosu_drv.h
@@ -3,13 +3,75 @@
#ifndef __ETHOSU_DRV_H__
#define __ETHOSU_DRV_H__
+#include <linux/idr.h>
+#include <linux/mutex.h>
#include <drm/gpu_scheduler.h>
struct ethosu_device;
+struct drm_device;
+struct drm_file;
struct ethosu_file_priv {
struct ethosu_device *edev;
struct drm_sched_entity sched_entity;
+
+ struct {
+ struct idr idr;
You may want to consider moving to XArray as we did in V3D [1].
Okay. I'll have to study that some. I still don't have my head around XArray...
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a5b0d095bcdb219348ed8ae1c97ee99fc4913b8
+ struct mutex lock;
+ } perfmon;
+};
+
+/* Performance monitor object. The perform lifetime is controlled by userspace
+ * using perfmon related ioctls. A perfmon can be attached to a submit_cl
+ * request, and when this is the case, HW perf counters will be activated just
+ * before the submit_cl is submitted to the GPU and disabled when the job is
+ * done. This way, only events related to a specific job will be counted.
+ */
+struct ethosu_perfmon {
+ /* Tracks the number of users of the perfmon, when this counter reaches
+ * zero the perfmon is destroyed.
+ */
+ refcount_t refcnt;
+
+ /* Protects perfmon stop, as it can be invoked from multiple places. */
+ struct mutex lock;
I'm not familiar with the Ethos hardware, but *if* it has only a single
performance monitor active in HW at any given moment (like v3d), this
lock doesn't enforce this restriction.
This lock only serializes start/stop of *one* perfmon object against
itself, but the invariant (at least on v3d) that needs protection is
device-wide: there is exactly one active perfmon at any moment in HW.
You mean V3D has 1 perfmon shared among its 3? h/w queues? We only
have 1 queue. We only touch the perfmon when we touch the h/w to
run/end a job, so the same serialization for the job accesses should
cover the PMU accesses.
I mean that userspace can create multiple software perfmons through
ethosu_ioctl_perfmon_create(), but AFAIU there is only one HW perfmon in
the device.
In that case, ethosu_perfmon->lock only serializes operations on a
single software perfmon object against itself. It does not serialize
accesses between different software perfmon objects, so two perfmons
could still race for the same unique HW perfmon.
If the HW exposes only one perfmon, it might be better to move this
lock to ethosu_device. That would protect the HW perfmon itself and
make the “only one active perfmon at a time” invariant explicit by
ensuring serialization between different software perfmons.
Best regards,
- Maíra
Rob