from:"Adrián Larumbe"

Re: [PATCH v4 1/4] drm/panthor: introduce job cycle and timestamp accounting

2024-07-31 Thread Adrián Larumbe

Hi Steven, thanks for the remarks.

On 19.07.2024 15:14, Steven Price wrote:
> On 16/07/2024 21:11, Adrián Larumbe wrote:
> > Enable calculations of job submission times in clock cycles and wall
> > time. This is done by expanding the boilerplate command stream when running
> > a job to include instructions that compute said times right before an after
> > a user CS.
> > 
> > Those numbers are stored in the queue's group's sync objects BO, right
> > after them. Because the queues in a group might have a different number of
> > slots, one must keep track of the overall slot tally when reckoning the
> > offset of a queue's time sample structs, one for each slot.
> > 
> > This commit is done in preparation for enabling DRM fdinfo support in the
> > Panthor driver, which depends on the numbers calculated herein.
> > 
> > A profile mode device flag has been added that will in a future commit
> > allow UM to toggle time sampling behaviour, which is disabled by default to
> > save power. It also enables marking jobs as being profiled and picks one of
> > two call instruction arrays to insert into the ring buffer. One of them
> > includes FW logic to sample the timestamp and cycle counter registers and
> > write them into the job's syncobj, and the other does not.
> > 
> > A profiled job's call sequence takes up two ring buffer slots, and this is
> > reflected when initialising the DRM scheduler for each queue, with a
> > profiled job contributing twice as many credits.
> > 
> > Signed-off-by: Adrián Larumbe 
> 
> Thanks for the updates, this looks better. A few minor comments below.
> 
> > ---
> >  drivers/gpu/drm/panthor/panthor_device.h |   2 +
> >  drivers/gpu/drm/panthor/panthor_sched.c  | 244 ---
> >  2 files changed, 216 insertions(+), 30 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
> > b/drivers/gpu/drm/panthor/panthor_device.h
> > index e388c0472ba7..3ede2f80df73 100644
> > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > @@ -162,6 +162,8 @@ struct panthor_device {
> >  */
> > struct page *dummy_latest_flush;
> > } pm;
> > +
> > +   bool profile_mode;
> >  };
> >  
> >  /**
> > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
> > b/drivers/gpu/drm/panthor/panthor_sched.c
> > index 79ffcbc41d78..6438e5ea1f2b 100644
> > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > @@ -93,6 +93,9 @@
> >  #define MIN_CSGS   3
> >  #define MAX_CSG_PRIO   0xf
> >  
> > +#define NUM_INSTRS_PER_SLOT16
> > +#define SLOTSIZE   (NUM_INSTRS_PER_SLOT * 
> > sizeof(u64))
> > +
> >  struct panthor_group;
> >  
> >  /**
> > @@ -466,6 +469,9 @@ struct panthor_queue {
> >  */
> > struct list_head in_flight_jobs;
> > } fence_ctx;
> > +
> > +   /** @time_offset: Offset of panthor_job_times structs in group's 
> > syncobj bo. */
> > +   unsigned long time_offset;
> 
> AFAICT this doesn't need to be stored. We could just pass this value
> into group_create_queue() as an extra parameter where it's used.

I think we need to keep this offset value around, because queues within the 
same group
could have a variable number of slots, so when fetching the sampled values from 
the
syncobjs BO in update_fdinfo_stats, it would have to traverse the entire array 
of
preceding queues and figure out their size in slots so as to jump over as many
struct panthor_job_times after the preceding syncobj array.

> >  };
> >  
> >  /**
> > @@ -592,7 +598,17 @@ struct panthor_group {
> >  * One sync object per queue. The position of the sync object is
> >  * determined by the queue index.
> >  */
> > -   struct panthor_kernel_bo *syncobjs;
> > +
> > +   struct {
> > +   /** @bo: Kernel BO holding the sync objects. */
> > +   struct panthor_kernel_bo *bo;
> > +
> > +   /**
> > +* @job_times_offset: Beginning of panthor_job_times struct 
> > samples after
> > +* the group's array of sync objects.
> > +*/
> > +   size_t job_times_offset;
> > +   } syncobjs;
> >  
> > /** @state: Group state. */
> > enum panthor_group_state state;
> > @@ -651,6 +667,18 @@ struct panthor_group {
> > struct li

[PATCH v4 3/4] drm/panthor: enable fdinfo for memory stats

2024-07-16 Thread Adrián Larumbe

Implement drm object's status callback.

Also, we consider a PRIME imported BO to be resident if its matching
dma_buf has an open attachment, which means its backing storage had already
been allocated.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Liviu Dudau 
---
 drivers/gpu/drm/panthor/panthor_gem.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_gem.c 
b/drivers/gpu/drm/panthor/panthor_gem.c
index 38f560864879..c60b599665d8 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -145,6 +145,17 @@ panthor_gem_prime_export(struct drm_gem_object *obj, int 
flags)
return drm_gem_prime_export(obj, flags);
 }
 
+static enum drm_gem_object_status panthor_gem_status(struct drm_gem_object 
*obj)
+{
+   struct panthor_gem_object *bo = to_panthor_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   if (bo->base.base.import_attach || bo->base.pages)
+   res |= DRM_GEM_OBJECT_RESIDENT;
+
+   return res;
+}
+
 static const struct drm_gem_object_funcs panthor_gem_funcs = {
.free = panthor_gem_free_object,
.print_info = drm_gem_shmem_object_print_info,
@@ -154,6 +165,7 @@ static const struct drm_gem_object_funcs panthor_gem_funcs 
= {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = panthor_gem_mmap,
+   .status = panthor_gem_status,
.export = panthor_gem_prime_export,
.vm_ops = _gem_shmem_vm_ops,
 };
-- 
2.45.1

[PATCH v4 2/4] drm/panthor: add DRM fdinfo support

2024-07-16 Thread Adrián Larumbe

Drawing from the FW-calculated values in the previous commit, we can
increase the numbers for an open file by collecting them from finished jobs
when updating their group synchronisation objects.

Display of fdinfo key-value pairs is governed by a flag that is by default
disabled in the present commit, and supporting manual toggle of it will be
the matter of a later commit.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_devfreq.c | 18 -
 drivers/gpu/drm/panthor/panthor_device.h  | 10 +
 drivers/gpu/drm/panthor/panthor_drv.c | 33 
 drivers/gpu/drm/panthor/panthor_sched.c   | 47 +++
 4 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c 
b/drivers/gpu/drm/panthor/panthor_devfreq.c
index c6d3c327cc24..9d0f891b9b53 100644
--- a/drivers/gpu/drm/panthor/panthor_devfreq.c
+++ b/drivers/gpu/drm/panthor/panthor_devfreq.c
@@ -62,14 +62,20 @@ static void panthor_devfreq_update_utilization(struct 
panthor_devfreq *pdevfreq)
 static int panthor_devfreq_target(struct device *dev, unsigned long *freq,
  u32 flags)
 {
+   struct panthor_device *ptdev = dev_get_drvdata(dev);
struct dev_pm_opp *opp;
+   int err;
 
opp = devfreq_recommended_opp(dev, freq, flags);
if (IS_ERR(opp))
return PTR_ERR(opp);
dev_pm_opp_put(opp);
 
-   return dev_pm_opp_set_rate(dev, *freq);
+   err = dev_pm_opp_set_rate(dev, *freq);
+   if (!err)
+   ptdev->current_frequency = *freq;
+
+   return err;
 }
 
 static void panthor_devfreq_reset(struct panthor_devfreq *pdevfreq)
@@ -130,6 +136,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
struct panthor_devfreq *pdevfreq;
struct dev_pm_opp *opp;
unsigned long cur_freq;
+   unsigned long freq = ULONG_MAX;
int ret;
 
pdevfreq = drmm_kzalloc(>base, sizeof(*ptdev->devfreq), 
GFP_KERNEL);
@@ -161,6 +168,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
return PTR_ERR(opp);
 
panthor_devfreq_profile.initial_freq = cur_freq;
+   ptdev->current_frequency = cur_freq;
 
/* Regulator coupling only takes care of synchronizing/balancing voltage
 * updates, but the coupled regulator needs to be enabled manually.
@@ -204,6 +212,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
 
dev_pm_opp_put(opp);
 
+   /* Find the fastest defined rate  */
+   opp = dev_pm_opp_find_freq_floor(dev, );
+   if (IS_ERR(opp))
+   return PTR_ERR(opp);
+   ptdev->fast_rate = freq;
+
+   dev_pm_opp_put(opp);
+
/*
 * Setup default thresholds for the simple_ondemand governor.
 * The values are chosen based on experiments.
diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
b/drivers/gpu/drm/panthor/panthor_device.h
index 3ede2f80df73..4536fbf43a4e 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -163,9 +163,16 @@ struct panthor_device {
struct page *dummy_latest_flush;
} pm;
 
+   unsigned long current_frequency;
+   unsigned long fast_rate;
bool profile_mode;
 };
 
+struct panthor_gpu_usage {
+   u64 time;
+   u64 cycles;
+};
+
 /**
  * struct panthor_file - Panthor file
  */
@@ -178,6 +185,9 @@ struct panthor_file {
 
/** @groups: Scheduling group pool attached to this file. */
struct panthor_group_pool *groups;
+
+   /** @stats: cycle and timestamp measures for job execution. */
+   struct panthor_gpu_usage stats;
 };
 
 int panthor_device_init(struct panthor_device *ptdev);
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
b/drivers/gpu/drm/panthor/panthor_drv.c
index b8a84f26b3ef..6a0c1a06a709 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -3,12 +3,17 @@
 /* Copyright 2019 Linaro, Ltd., Rob Herring  */
 /* Copyright 2019 Collabora ltd. */
 
+#ifdef CONFIG_ARM_ARCH_TIMER
+#include 
+#endif
+
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1351,6 +1356,32 @@ static int panthor_mmap(struct file *filp, struct 
vm_area_struct *vma)
return ret;
 }
 
+static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev,
+   struct panthor_file *pfile,
+   struct drm_printer *p)
+{
+   if (ptdev->profile_mode) {
+#ifdef CONFIG_ARM_ARCH_TIMER
+   drm_printf(p, "drm-engine-panthor:\t%llu ns\n",
+  DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
+   arch_timer_get_cntfrq()));
+#endif
+   drm_printf(p, "drm-cycles-panthor:\t%llu\n", 
pfile->stats.cycles);
+   }
+   drm_print

[PATCH v4 4/4] drm/panthor: add sysfs knob for enabling job profiling

2024-07-16 Thread Adrián Larumbe

This commit introduces a DRM device sysfs attribute that lets UM control
the job accounting status in the device. The knob variable had been brought
in as part of a previous commit, but now we're able to fix it manually.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_drv.c | 36 +++
 1 file changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
b/drivers/gpu/drm/panthor/panthor_drv.c
index 6a0c1a06a709..a2876310856f 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1448,6 +1448,41 @@ static void panthor_remove(struct platform_device *pdev)
panthor_device_unplug(ptdev);
 }
 
+static ssize_t profiling_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   struct panthor_device *ptdev = dev_get_drvdata(dev);
+
+   return sysfs_emit(buf, "%d\n", ptdev->profile_mode);
+}
+
+static ssize_t profiling_store(struct device *dev,
+  struct device_attribute *attr,
+  const char *buf, size_t len)
+{
+   struct panthor_device *ptdev = dev_get_drvdata(dev);
+   bool value;
+   int err;
+
+   err = kstrtobool(buf, );
+   if (err)
+   return err;
+
+   ptdev->profile_mode = value;
+
+   return len;
+}
+
+static DEVICE_ATTR_RW(profiling);
+
+static struct attribute *panthor_attrs[] = {
+   _attr_profiling.attr,
+   NULL,
+};
+
+ATTRIBUTE_GROUPS(panthor);
+
 static const struct of_device_id dt_match[] = {
{ .compatible = "rockchip,rk3588-mali" },
{ .compatible = "arm,mali-valhall-csf" },
@@ -1467,6 +1502,7 @@ static struct platform_driver panthor_driver = {
.name = "panthor",
.pm = pm_ptr(_pm_ops),
.of_match_table = dt_match,
+   .dev_groups = panthor_groups,
},
 };
 
-- 
2.45.1

[PATCH v4 1/4] drm/panthor: introduce job cycle and timestamp accounting

2024-07-16 Thread Adrián Larumbe

Enable calculations of job submission times in clock cycles and wall
time. This is done by expanding the boilerplate command stream when running
a job to include instructions that compute said times right before an after
a user CS.

Those numbers are stored in the queue's group's sync objects BO, right
after them. Because the queues in a group might have a different number of
slots, one must keep track of the overall slot tally when reckoning the
offset of a queue's time sample structs, one for each slot.

This commit is done in preparation for enabling DRM fdinfo support in the
Panthor driver, which depends on the numbers calculated herein.

A profile mode device flag has been added that will in a future commit
allow UM to toggle time sampling behaviour, which is disabled by default to
save power. It also enables marking jobs as being profiled and picks one of
two call instruction arrays to insert into the ring buffer. One of them
includes FW logic to sample the timestamp and cycle counter registers and
write them into the job's syncobj, and the other does not.

A profiled job's call sequence takes up two ring buffer slots, and this is
reflected when initialising the DRM scheduler for each queue, with a
profiled job contributing twice as many credits.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_device.h |   2 +
 drivers/gpu/drm/panthor/panthor_sched.c  | 244 ---
 2 files changed, 216 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
b/drivers/gpu/drm/panthor/panthor_device.h
index e388c0472ba7..3ede2f80df73 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -162,6 +162,8 @@ struct panthor_device {
 */
struct page *dummy_latest_flush;
} pm;
+
+   bool profile_mode;
 };
 
 /**
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
b/drivers/gpu/drm/panthor/panthor_sched.c
index 79ffcbc41d78..6438e5ea1f2b 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -93,6 +93,9 @@
 #define MIN_CSGS   3
 #define MAX_CSG_PRIO   0xf
 
+#define NUM_INSTRS_PER_SLOT16
+#define SLOTSIZE   (NUM_INSTRS_PER_SLOT * 
sizeof(u64))
+
 struct panthor_group;
 
 /**
@@ -466,6 +469,9 @@ struct panthor_queue {
 */
struct list_head in_flight_jobs;
} fence_ctx;
+
+   /** @time_offset: Offset of panthor_job_times structs in group's 
syncobj bo. */
+   unsigned long time_offset;
 };
 
 /**
@@ -592,7 +598,17 @@ struct panthor_group {
 * One sync object per queue. The position of the sync object is
 * determined by the queue index.
 */
-   struct panthor_kernel_bo *syncobjs;
+
+   struct {
+   /** @bo: Kernel BO holding the sync objects. */
+   struct panthor_kernel_bo *bo;
+
+   /**
+* @job_times_offset: Beginning of panthor_job_times struct 
samples after
+* the group's array of sync objects.
+*/
+   size_t job_times_offset;
+   } syncobjs;
 
/** @state: Group state. */
enum panthor_group_state state;
@@ -651,6 +667,18 @@ struct panthor_group {
struct list_head wait_node;
 };
 
+struct panthor_job_times {
+   struct {
+   u64 before;
+   u64 after;
+   } cycles;
+
+   struct {
+   u64 before;
+   u64 after;
+   } time;
+};
+
 /**
  * group_queue_work() - Queue a group work
  * @group: Group to queue the work for.
@@ -730,6 +758,9 @@ struct panthor_job {
/** @queue_idx: Index of the queue inside @group. */
u32 queue_idx;
 
+   /** @ringbuf_idx: Index of the ringbuffer inside @queue. */
+   u32 ringbuf_idx;
+
/** @call_info: Information about the userspace command stream call. */
struct {
/** @start: GPU address of the userspace command stream. */
@@ -764,6 +795,9 @@ struct panthor_job {
 
/** @done_fence: Fence signaled when the job is finished or cancelled. 
*/
struct dma_fence *done_fence;
+
+   /** @is_profiled: Whether timestamp and cycle numbers were gathered for 
this job */
+   bool is_profiled;
 };
 
 static void
@@ -844,7 +878,7 @@ static void group_release_work(struct work_struct *work)
 
panthor_kernel_bo_destroy(group->suspend_buf);
panthor_kernel_bo_destroy(group->protm_suspend_buf);
-   panthor_kernel_bo_destroy(group->syncobjs);
+   panthor_kernel_bo_destroy(group->syncobjs.bo);
 
panthor_vm_put(group->vm);
kfree(group);
@@ -1969,8 +2003,6 @@ tick_ctx_init(struct panthor_scheduler *sched,
}
 }
 
-#define NUM_INSTRS_PER_SLOT16
-
 static void
 group_term_post_processing(struct panth

[PATCH v4 0/4] Support fdinfo runtime and memory stats on Panthor

2024-07-16 Thread Adrián Larumbe

This patch series enables userspace utilities like gputop and nvtop to
query a render context's fdinfo file and figure out rates of engine
and memory utilisation.

Previous discussion can be found at
https://lore.kernel.org/dri-devel/dqhnxhgho6spfh7xhw6yvs2iiqeqzeg63e6jqqpw2g7gkrfphn@dojsixyl4esv/

Changelog:
v4:
 - Fixed wrong assignment location for frequency values in Panthor's devfreq
 - Removed the last two commits about registering size of internal BO's
 - Rearranged patch series so that sysfs knob is done last and all the previous
 time sampling and fdinfo show dependencies are already in place
v3:
 - Fixed some nits and removed useless bounds check in panthor_sched.c
 - Added support for sysfs profiling knob and optional job accounting
 - Added new patches for calculating size of internal BO's
v2:
 - Split original first patch in two, one for FW CS cycle and timestamp
 calculations and job accounting memory management, and a second one
 that enables fdinfo.
 - Moved NUM_INSTRS_PER_SLOT to the file prelude
 - Removed nelem variable from the group's struct definition.
 - Precompute size of group's syncobj BO to avoid code duplication.
 - Some minor nits.

Adrián Larumbe (4):
  drm/panthor: introduce job cycle and timestamp accounting
  drm/panthor: add DRM fdinfo support
  drm/panthor: enable fdinfo for memory stats
  drm/panthor: add sysfs knob for enabling job profiling

 drivers/gpu/drm/panthor/panthor_devfreq.c |  18 +-
 drivers/gpu/drm/panthor/panthor_device.h  |  12 +
 drivers/gpu/drm/panthor/panthor_drv.c |  69 +
 drivers/gpu/drm/panthor/panthor_gem.c |  12 +
 drivers/gpu/drm/panthor/panthor_sched.c   | 291 +++---
 5 files changed, 371 insertions(+), 31 deletions(-)

-- 
2.45.1

Re: [PATCH v3 0/7] Support fdinfo runtime and memory stats on Panthor

2024-07-15 Thread Adrián Larumbe

Hi Steven, thanks for the review.

On 13.06.2024 16:28, Steven Price wrote:
> On 06/06/2024 01:49, Adrián Larumbe wrote:
> > This patch series enables userspace utilities like gputop and nvtop to
> > query a render context's fdinfo file and figure out rates of engine
> > and memory utilisation.
> > 
> > Previous discussion can be found at
> > https://lore.kernel.org/dri-devel/20240423213240.91412-1-adrian.laru...@collabora.com/
> > 
> > Changelog:
> > v3:
> >  - Fixed some nits and removed useless bounds check in panthor_sched.c
> >  - Added support for sysfs profiling knob and optional job accounting
> >  - Added new patches for calculating size of internal BO's
> > v2:
> >  - Split original first patch in two, one for FW CS cycle and timestamp
> >  calculations and job accounting memory management, and a second one
> >  that enables fdinfo.
> >  - Moved NUM_INSTRS_PER_SLOT to the file prelude
> >  - Removed nelem variable from the group's struct definition.
> >  - Precompute size of group's syncobj BO to avoid code duplication.
> >  - Some minor nits.
> > 
> > 
> > Adrián Larumbe (7):
> >   drm/panthor: introduce job cycle and timestamp accounting
> >   drm/panthor: add DRM fdinfo support
> >   drm/panthor: enable fdinfo for memory stats
> >   drm/panthor: add sysfs knob for enabling job profiling
> >   drm/panthor: support job accounting
> >   drm/drm_file: add display of driver's internal memory size
> >   drm/panthor: register size of internal objects through fdinfo
> 
> The general shape of what you end up with looks correct, but these
> patches are now in a bit of a mess. It's confusing to review when the
> accounting is added unconditionally and then a sysfs knob is added which
> changes it all to be conditional. Equally that last patch (register size
> of internal objects through fdinfo) includes a massive amount of churn
> moving everything into an 'fdinfo' struct which really should be in a
> separate patch.
> 
> Ideally this needs to be reworked into a logical series of patches with
> knowledge of what's coming next. E.g. the first patch could introduce
> the code for cycle/timestamp accounting but leave it disabled to be then
> enabled by the sysfs knob patch.
> 
> One thing I did notice though is that I wasn't seeing the GPU frequency
> change, looking more closely at this it seems like there's something
> dodgy going on with the devfreq code. From what I can make out I often
> end up in a situation where all contexts are idle every time tick_work()
> is called - I think this is simply because tick_work() is scheduled with
> a delay and by the time the delay has hit the work is complete. Nothing
> to do with this series, but something that needs looking into. I'm on
> holiday for a week but I'll try to look at this when I'm back.

I've found why the current frequency value wasn't updating when manually
adjusting the device's devfreq governor. Fix will be part of the next patch
series revision.

Adrian

> Steve
> 
> >  Documentation/gpu/drm-usage-stats.rst |   4 +
> >  drivers/gpu/drm/drm_file.c|   9 +-
> >  drivers/gpu/drm/msm/msm_drv.c |   2 +-
> >  drivers/gpu/drm/panfrost/panfrost_drv.c   |   2 +-
> >  drivers/gpu/drm/panthor/panthor_devfreq.c |  10 +
> >  drivers/gpu/drm/panthor/panthor_device.c  |   2 +
> >  drivers/gpu/drm/panthor/panthor_device.h  |  21 ++
> >  drivers/gpu/drm/panthor/panthor_drv.c |  83 +-
> >  drivers/gpu/drm/panthor/panthor_fw.c  |  16 +-
> >  drivers/gpu/drm/panthor/panthor_fw.h  |   5 +-
> >  drivers/gpu/drm/panthor/panthor_gem.c |  67 -
> >  drivers/gpu/drm/panthor/panthor_gem.h |  16 +-
> >  drivers/gpu/drm/panthor/panthor_heap.c|  23 +-
> >  drivers/gpu/drm/panthor/panthor_heap.h|   6 +-
> >  drivers/gpu/drm/panthor/panthor_mmu.c |   8 +-
> >  drivers/gpu/drm/panthor/panthor_mmu.h |   3 +-
> >  drivers/gpu/drm/panthor/panthor_sched.c   | 304 +++---
> >  include/drm/drm_file.h|   7 +-
> >  18 files changed, 522 insertions(+), 66 deletions(-)
> > 
> > 
> > base-commit: 310ec03841a36e3f45fb528f0dfdfe5b9e84b037

Re: [PATCH v3 0/7] Support fdinfo runtime and memory stats on Panthor

2024-06-24 Thread Adrián Larumbe

Hi Steven,

On 13.06.2024 16:28, Steven Price wrote:
> On 06/06/2024 01:49, Adrián Larumbe wrote:
> > This patch series enables userspace utilities like gputop and nvtop to
> > query a render context's fdinfo file and figure out rates of engine
> > and memory utilisation.
> > 
> > Previous discussion can be found at
> > https://lore.kernel.org/dri-devel/20240423213240.91412-1-adrian.laru...@collabora.com/
> > 
> > Changelog:
> > v3:
> >  - Fixed some nits and removed useless bounds check in panthor_sched.c
> >  - Added support for sysfs profiling knob and optional job accounting
> >  - Added new patches for calculating size of internal BO's
> > v2:
> >  - Split original first patch in two, one for FW CS cycle and timestamp
> >  calculations and job accounting memory management, and a second one
> >  that enables fdinfo.
> >  - Moved NUM_INSTRS_PER_SLOT to the file prelude
> >  - Removed nelem variable from the group's struct definition.
> >  - Precompute size of group's syncobj BO to avoid code duplication.
> >  - Some minor nits.
> > 
> > 
> > Adrián Larumbe (7):
> >   drm/panthor: introduce job cycle and timestamp accounting
> >   drm/panthor: add DRM fdinfo support
> >   drm/panthor: enable fdinfo for memory stats
> >   drm/panthor: add sysfs knob for enabling job profiling
> >   drm/panthor: support job accounting
> >   drm/drm_file: add display of driver's internal memory size
> >   drm/panthor: register size of internal objects through fdinfo
> 
> The general shape of what you end up with looks correct, but these
> patches are now in a bit of a mess. It's confusing to review when the
> accounting is added unconditionally and then a sysfs knob is added which
> changes it all to be conditional. Equally that last patch (register size
> of internal objects through fdinfo) includes a massive amount of churn
> moving everything into an 'fdinfo' struct which really should be in a
> separate patch.

I do agree with you in that perhaps too many things change across successive
patches in the series. I think I can explain this because of the way the series
has evolved thorugh successive revisions.

In the last one of them, only the first three patches were present, and both
Liviu and Boris seemed happy with the shape they had taken, but then Boris
suggested adding the sysfs knob and optional profiling support rather than
submitting them as part of a different series like I had done in Panfrost. In
that spirit, I decided to keep the first three patches intact.

The last two patches are a bit more of an afterthought, and because they touch
on the drm fdinfo core, I understood they were more likely to be rejected for
now, at least until consensus with Tvrtko and other people involved in the
development of fdinfo had agreed on a way to report internal bo sizes.  However,
being also part of fdinfo, I thought this series was a good place to spark a
debate about them, even if they don't seem as seamlessly linked with the rest
of the work.

> Ideally this needs to be reworked into a logical series of patches with
> knowledge of what's coming next. E.g. the first patch could introduce
> the code for cycle/timestamp accounting but leave it disabled to be then
> enabled by the sysfs knob patch.
> 
> One thing I did notice though is that I wasn't seeing the GPU frequency
> change, looking more closely at this it seems like there's something
> dodgy going on with the devfreq code. From what I can make out I often
> end up in a situation where all contexts are idle every time tick_work()
> is called - I think this is simply because tick_work() is scheduled with
> a delay and by the time the delay has hit the work is complete. Nothing
> to do with this series, but something that needs looking into. I'm on
> holiday for a week but I'll try to look at this when I'm back.

Would you mind sharing what you do in UM to trigger this behaviour and also
maybe the debug traces you've written into the driver to confirm this?

> Steve
> 
> >  Documentation/gpu/drm-usage-stats.rst |   4 +
> >  drivers/gpu/drm/drm_file.c|   9 +-
> >  drivers/gpu/drm/msm/msm_drv.c |   2 +-
> >  drivers/gpu/drm/panfrost/panfrost_drv.c   |   2 +-
> >  drivers/gpu/drm/panthor/panthor_devfreq.c |  10 +
> >  drivers/gpu/drm/panthor/panthor_device.c  |   2 +
> >  drivers/gpu/drm/panthor/panthor_device.h  |  21 ++
> >  drivers/gpu/drm/panthor/panthor_drv.c |  83 +-
> >  drivers/gpu/drm/panthor/panthor_fw.c  |  16 +-
> >  drivers/gpu/drm/panthor/panthor_fw.h  |   5 +-
> >  drivers/gpu/drm/panthor/panthor_gem.c |  67 -
> >  drivers/gpu/drm/panthor/panthor_gem.h |  16 +-
>

[PATCH v3 7/7] drm/panthor: register size of internal objects through fdinfo

2024-06-05 Thread Adrián Larumbe

This includes both DRM objects created to support queues, groups and heaps,
and also objects whose pages are shared between the GPU and the MCU.

However, this doesn't include objects that hold the firmware's binary
regions, since these aren't owned by a render context and are allocated
only once at driver's initialisation time.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_device.c |  2 +
 drivers/gpu/drm/panthor/panthor_device.h | 13 +-
 drivers/gpu/drm/panthor/panthor_drv.c| 20 ++---
 drivers/gpu/drm/panthor/panthor_fw.c | 16 +--
 drivers/gpu/drm/panthor/panthor_fw.h |  5 ++-
 drivers/gpu/drm/panthor/panthor_gem.c| 55 ++--
 drivers/gpu/drm/panthor/panthor_gem.h| 16 +--
 drivers/gpu/drm/panthor/panthor_heap.c   | 23 +++---
 drivers/gpu/drm/panthor/panthor_heap.h   |  6 ++-
 drivers/gpu/drm/panthor/panthor_mmu.c|  8 +++-
 drivers/gpu/drm/panthor/panthor_mmu.h|  3 +-
 drivers/gpu/drm/panthor/panthor_sched.c  | 19 
 12 files changed, 147 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.c 
b/drivers/gpu/drm/panthor/panthor_device.c
index 4082c8f2951d..868fa9aba570 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -179,6 +179,8 @@ int panthor_device_init(struct panthor_device *ptdev)
if (ret)
return ret;
 
+   drmm_mutex_init(>base, >private_obj_list_lock);
+
/*
 * Set the dummy page holding the latest flush to 1. This will cause the
 * flush to avoided as we know it isn't necessary if the submission
diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
b/drivers/gpu/drm/panthor/panthor_device.h
index c3ec1e31f8b7..d3abf9700887 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -166,6 +166,9 @@ struct panthor_device {
bool profile_mode;
unsigned long current_frequency;
unsigned long fast_rate;
+
+   /** @private_obj_list_lock: Lock around per-file lists of internal GEM 
objects */
+   struct mutex private_obj_list_lock;
 };
 
 struct panthor_gpu_usage {
@@ -186,8 +189,14 @@ struct panthor_file {
/** @groups: Scheduling group pool attached to this file. */
struct panthor_group_pool *groups;
 
-   /** @stats: cycle and timestamp measures for job execution. */
-   struct panthor_gpu_usage stats;
+   /** @fdinfo: Open file tracking information */
+   struct {
+   /** @stats: cycle and timestamp measures for job execution. */
+   struct panthor_gpu_usage stats;
+
+   /** @private_file_list: File's list of private GEM objects. */
+   struct list_head private_file_list;
+   } fdinfo;
 };
 
 int panthor_device_init(struct panthor_device *ptdev);
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
b/drivers/gpu/drm/panthor/panthor_drv.c
index a2876310856f..20a1add84014 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1048,13 +1048,13 @@ static int panthor_ioctl_tiler_heap_create(struct 
drm_device *ddev, void *data,
if (!vm)
return -EINVAL;
 
-   pool = panthor_vm_get_heap_pool(vm, true);
+   pool = panthor_vm_get_heap_pool(vm, true, pfile);
if (IS_ERR(pool)) {
ret = PTR_ERR(pool);
goto out_put_vm;
}
 
-   ret = panthor_heap_create(pool,
+   ret = panthor_heap_create(pool, pfile,
  args->initial_chunk_count,
  args->chunk_size,
  args->max_chunks,
@@ -1094,7 +1094,7 @@ static int panthor_ioctl_tiler_heap_destroy(struct 
drm_device *ddev, void *data,
if (!vm)
return -EINVAL;
 
-   pool = panthor_vm_get_heap_pool(vm, false);
+   pool = panthor_vm_get_heap_pool(vm, false, NULL);
if (IS_ERR(pool)) {
ret = PTR_ERR(pool);
goto out_put_vm;
@@ -1268,6 +1268,8 @@ panthor_open(struct drm_device *ddev, struct drm_file 
*file)
 
pfile->ptdev = ptdev;
 
+   INIT_LIST_HEAD(>fdinfo.private_file_list);
+
ret = panthor_vm_pool_create(pfile);
if (ret)
goto err_free_file;
@@ -1295,6 +1297,12 @@ panthor_postclose(struct drm_device *ddev, struct 
drm_file *file)
 {
struct panthor_file *pfile = file->driver_priv;
 
+   /*
+* Group's internal BO's are destroyed asynchronously in a separate 
worker thread,
+* so there's a chance by the time BO release happens, the file is 
already gone.
+*/
+   panthor_gem_dettach_internal_bos(pfile);
+
panthor_group_pool_destroy(pfile);
panthor_vm_pool_destroy(pfile);
 
@@ -1363,10 +1371,10 @@ static void panthor_gpu_show_fdinfo(struct 
panthor_device *ptdev,
if (ptdev-

[PATCH v3 6/7] drm/drm_file: add display of driver's internal memory size

2024-06-05 Thread Adrián Larumbe

Some drivers must allocate a considerable amount of memory for bookkeeping
structures and GPU's MCU-kernel shared communication regions. These are
often created as a result of the invocation of the driver's ioctl()
interface functions, so it is sensible to consider them as being owned by
the render context associated with an open drm file.

However, at the moment drm_show_memory_stats only traverses the UM-exposed
drm objects for which a handle exists. Private driver objects and memory
regions, though connected to a render context, are unaccounted for in their
fdinfo numbers.

Add a new drm_memory_stats 'internal' memory category.

Because deciding what constitutes an 'internal' object and where to find
these are driver-dependent, calculation of this size must be done through a
driver-provided function pointer, which becomes the third argument of
drm_show_memory_stats. Drivers which have no interest in exposing the size
of internal memory objects can keep passing NULL for unaltered behaviour.

Signed-off-by: Adrián Larumbe 
---
 Documentation/gpu/drm-usage-stats.rst   | 4 
 drivers/gpu/drm/drm_file.c  | 9 +++--
 drivers/gpu/drm/msm/msm_drv.c   | 2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +-
 include/drm/drm_file.h  | 7 ++-
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index 6dc299343b48..0da5ebecd232 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -157,6 +157,10 @@ The total size of buffers that are purgeable.
 
 The total size of buffers that are active on one or more engines.
 
+- drm-internal-:  [KiB|MiB]
+
+The total size of GEM objects that aren't exposed to user space.
+
 Implementation Details
 ==
 
diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 638ffaf5..d1c13eed8d34 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -874,9 +874,10 @@ void drm_print_memory_stats(struct drm_printer *p,
enum drm_gem_object_status supported_status,
const char *region)
 {
-   print_size(p, "total", region, stats->private + stats->shared);
+   print_size(p, "total", region, stats->private + stats->shared + 
stats->internal);
print_size(p, "shared", region, stats->shared);
print_size(p, "active", region, stats->active);
+   print_size(p, "internal", region, stats->internal);
 
if (supported_status & DRM_GEM_OBJECT_RESIDENT)
print_size(p, "resident", region, stats->resident);
@@ -890,11 +891,12 @@ EXPORT_SYMBOL(drm_print_memory_stats);
  * drm_show_memory_stats - Helper to collect and show standard fdinfo memory 
stats
  * @p: the printer to print output to
  * @file: the DRM file
+ * @func: driver-specific function pointer to count the size of internal 
objects
  *
  * Helper to iterate over GEM objects with a handle allocated in the specified
  * file.
  */
-void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file)
+void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, 
internal_bos func)
 {
struct drm_gem_object *obj;
struct drm_memory_stats status = {};
@@ -940,6 +942,9 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
spin_unlock(>table_lock);
 
+   if (func)
+   func(, file);
+
drm_print_memory_stats(p, , supported_status, "memory");
 }
 EXPORT_SYMBOL(drm_show_memory_stats);
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 9c33f4e3f822..f97d3cdc4f50 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -880,7 +880,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
 
msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p);
 
-   drm_show_memory_stats(p, file);
+   drm_show_memory_stats(p, file, NULL);
 }
 
 static const struct file_operations fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index ef9f6c0716d5..53640ac44e42 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -570,7 +570,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
 
panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
 
-   drm_show_memory_stats(p, file);
+   drm_show_memory_stats(p, file, NULL);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index ab230d3af138..d71a5ac50ea9 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -464,6 +464,7 @@ void drm_send_event_timestamp_locked

[PATCH v3 5/7] drm/panthor: support job accounting

2024-06-05 Thread Adrián Larumbe

A previous commit brought in a sysfs knob to control the driver's profiling
status. This changeset flags jobs as being profiled according to the
driver's global profiling status, and picks one of two call instruction
arrays to insert into the ring buffer. One of them includes FW logic to
sample the timestamp and cycle counter registers and write them into the
job's syncobj, and the other does not.

A profiled job's call sequence takes up two ring buffer slots, and this is
reflected when initialising the DRM scheduler for each queue, with a
profiled job contributing twice as many credits.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_sched.c | 95 ++---
 1 file changed, 86 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
b/drivers/gpu/drm/panthor/panthor_sched.c
index bbd20db40e7b..4fb6fc5c2314 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -93,7 +93,7 @@
 #define MIN_CSGS   3
 #define MAX_CSG_PRIO   0xf
 
-#define NUM_INSTRS_PER_SLOT32
+#define NUM_INSTRS_PER_SLOT16
 #define SLOTSIZE   (NUM_INSTRS_PER_SLOT * 
sizeof(u64))
 
 struct panthor_group;
@@ -807,6 +807,9 @@ struct panthor_job {
 
/** @done_fence: Fence signaled when the job is finished or cancelled. 
*/
struct dma_fence *done_fence;
+
+   /** @is_profiled: Whether timestamp and cycle numbers were gathered for 
this job */
+   bool is_profiled;
 };
 
 static void
@@ -2865,7 +2868,8 @@ static void group_sync_upd_work(struct work_struct *work)
dma_fence_end_signalling(cookie);
 
list_for_each_entry_safe(job, job_tmp, _jobs, node) {
-   update_fdinfo_stats(job);
+   if (job->is_profiled)
+   update_fdinfo_stats(job);
list_del_init(>node);
panthor_job_put(>base);
}
@@ -2884,6 +2888,8 @@ queue_run_job(struct drm_sched_job *sched_job)
u32 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf);
u32 ringbuf_insert = queue->iface.input->insert & (ringbuf_size - 1);
u32 ringbuf_index = ringbuf_insert / (SLOTSIZE);
+   bool ringbuf_wraparound =
+   job->is_profiled && ((ringbuf_size/SLOTSIZE) == ringbuf_index + 
1);
u64 addr_reg = ptdev->csif_info.cs_reg_count -
   ptdev->csif_info.unpreserved_cs_reg_count;
u64 val_reg = addr_reg + 2;
@@ -2893,12 +2899,51 @@ queue_run_job(struct drm_sched_job *sched_job)
job->queue_idx * sizeof(struct panthor_syncobj_64b);
u64 times_addr = panthor_kernel_bo_gpuva(group->syncobjs.bo) + 
queue->time_offset +
(ringbuf_index * sizeof(struct panthor_job_times));
+   size_t call_insrt_size;
+   u64 *call_instrs;
 
u32 waitall_mask = GENMASK(sched->sb_slot_count - 1, 0);
struct dma_fence *done_fence;
int ret;
 
-   u64 call_instrs[NUM_INSTRS_PER_SLOT] = {
+   u64 call_instrs_simple[NUM_INSTRS_PER_SLOT] = {
+   /* MOV32 rX+2, cs.latest_flush */
+   (2ull << 56) | (val_reg << 48) | job->call_info.latest_flush,
+
+   /* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */
+   (36ull << 56) | (0ull << 48) | (val_reg << 40) | (0 << 16) | 
0x233,
+
+   /* MOV48 rX:rX+1, cs.start */
+   (1ull << 56) | (addr_reg << 48) | job->call_info.start,
+
+   /* MOV32 rX+2, cs.size */
+   (2ull << 56) | (val_reg << 48) | job->call_info.size,
+
+   /* WAIT(0) => waits for FLUSH_CACHE2 instruction */
+   (3ull << 56) | (1 << 16),
+
+   /* CALL rX:rX+1, rX+2 */
+   (32ull << 56) | (addr_reg << 40) | (val_reg << 32),
+
+   /* MOV48 rX:rX+1, sync_addr */
+   (1ull << 56) | (addr_reg << 48) | sync_addr,
+
+   /* MOV48 rX+2, #1 */
+   (1ull << 56) | (val_reg << 48) | 1,
+
+   /* WAIT(all) */
+   (3ull << 56) | (waitall_mask << 16),
+
+   /* SYNC_ADD64.system_scope.propage_err.nowait rX:rX+1, rX+2*/
+   (51ull << 56) | (0ull << 48) | (addr_reg << 40) | (val_reg << 
32) | (0 << 16) | 1,
+
+   /* ERROR_BARRIER, so we can recover from faults at job
+* boundaries.
+*/
+   (47ull << 56),
+   };
+
+   u64 call_instrs_profile[NUM_INSTRS_PER_SLOT*2] = {
/* MOV32 rX+2, cs.latest_flush */
(2ull << 56) | (val_reg << 48) | job->call_info.latest_flush,
 
@@ -2960,9 +3005,18 @@

[PATCH v3 2/7] drm/panthor: add DRM fdinfo support

2024-06-05 Thread Adrián Larumbe

Drawing from the FW-calculated values in the previous commit, we can
increase the numbers for an open file by collecting them from finished jobs
when updating their group synchronisation objects.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_devfreq.c | 10 +
 drivers/gpu/drm/panthor/panthor_device.h  | 11 ++
 drivers/gpu/drm/panthor/panthor_drv.c | 31 +++
 drivers/gpu/drm/panthor/panthor_sched.c   | 46 +++
 4 files changed, 98 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c 
b/drivers/gpu/drm/panthor/panthor_devfreq.c
index c6d3c327cc24..5eededaeade7 100644
--- a/drivers/gpu/drm/panthor/panthor_devfreq.c
+++ b/drivers/gpu/drm/panthor/panthor_devfreq.c
@@ -91,6 +91,7 @@ static int panthor_devfreq_get_dev_status(struct device *dev,
spin_lock_irqsave(>lock, irqflags);
 
panthor_devfreq_update_utilization(pdevfreq);
+   ptdev->current_frequency = status->current_frequency;
 
status->total_time = ktime_to_ns(ktime_add(pdevfreq->busy_time,
   pdevfreq->idle_time));
@@ -130,6 +131,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
struct panthor_devfreq *pdevfreq;
struct dev_pm_opp *opp;
unsigned long cur_freq;
+   unsigned long freq = ULONG_MAX;
int ret;
 
pdevfreq = drmm_kzalloc(>base, sizeof(*ptdev->devfreq), 
GFP_KERNEL);
@@ -204,6 +206,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
 
dev_pm_opp_put(opp);
 
+   /* Find the fastest defined rate  */
+   opp = dev_pm_opp_find_freq_floor(dev, );
+   if (IS_ERR(opp))
+   return PTR_ERR(opp);
+   ptdev->fast_rate = freq;
+
+   dev_pm_opp_put(opp);
+
/*
 * Setup default thresholds for the simple_ondemand governor.
 * The values are chosen based on experiments.
diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
b/drivers/gpu/drm/panthor/panthor_device.h
index e388c0472ba7..8a0260a7b90a 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -162,6 +162,14 @@ struct panthor_device {
 */
struct page *dummy_latest_flush;
} pm;
+
+   unsigned long current_frequency;
+   unsigned long fast_rate;
+};
+
+struct panthor_gpu_usage {
+   u64 time;
+   u64 cycles;
 };
 
 /**
@@ -176,6 +184,9 @@ struct panthor_file {
 
/** @groups: Scheduling group pool attached to this file. */
struct panthor_group_pool *groups;
+
+   /** @stats: cycle and timestamp measures for job execution. */
+   struct panthor_gpu_usage stats;
 };
 
 int panthor_device_init(struct panthor_device *ptdev);
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
b/drivers/gpu/drm/panthor/panthor_drv.c
index b8a84f26b3ef..6d25385e02a1 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -3,12 +3,17 @@
 /* Copyright 2019 Linaro, Ltd., Rob Herring  */
 /* Copyright 2019 Collabora ltd. */
 
+#ifdef CONFIG_ARM_ARCH_TIMER
+#include 
+#endif
+
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1351,6 +1356,30 @@ static int panthor_mmap(struct file *filp, struct 
vm_area_struct *vma)
return ret;
 }
 
+static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev,
+   struct panthor_file *pfile,
+   struct drm_printer *p)
+{
+#ifdef CONFIG_ARM_ARCH_TIMER
+   drm_printf(p, "drm-engine-panthor:\t%llu ns\n",
+  DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
+   arch_timer_get_cntfrq()));
+#endif
+   drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles);
+   drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate);
+   drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", 
ptdev->current_frequency);
+}
+
+static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file)
+{
+   struct drm_device *dev = file->minor->dev;
+   struct panthor_device *ptdev = container_of(dev, struct panthor_device, 
base);
+
+   panthor_gpu_show_fdinfo(ptdev, file->driver_priv, p);
+
+   drm_show_memory_stats(p, file);
+}
+
 static const struct file_operations panthor_drm_driver_fops = {
.open = drm_open,
.release = drm_release,
@@ -1360,6 +1389,7 @@ static const struct file_operations 
panthor_drm_driver_fops = {
.read = drm_read,
.llseek = noop_llseek,
.mmap = panthor_mmap,
+   .show_fdinfo = drm_show_fdinfo,
 };
 
 #ifdef CONFIG_DEBUG_FS
@@ -1378,6 +1408,7 @@ static const struct drm_driver panthor_drm_driver = {
   DRIVER_SYNCOBJ_TIMELINE | DRIVER_GEM_GPUVA,
.open = panthor_

[PATCH v3 1/7] drm/panthor: introduce job cycle and timestamp accounting

2024-06-05 Thread Adrián Larumbe

Enable calculations of job submission times in clock cycles and wall
time. This is done by expanding the boilerplate command stream when running
a job to include instructions that compute said times right before an after
a user CS.

Those numbers are stored in the queue's group's sync objects BO, right
after them. Because the queues in a group might have a different number of
slots, one must keep track of the overall slot tally when reckoning the
offset of a queue's time sample structs, one for each slot.

NUM_INSTRS_PER_SLOT had to be increased to 32 because of adding new FW
instructions for storing and subtracting the cycle counter and timestamp
register, and it must always remain a power of two.

This commit is done in preparation for enabling DRM fdinfo support in the
Panthor driver, which depends on the numbers calculated herein.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Liviu Dudau 
---
 drivers/gpu/drm/panthor/panthor_sched.c | 156 
 1 file changed, 132 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
b/drivers/gpu/drm/panthor/panthor_sched.c
index 79ffcbc41d78..62a67d6bd37a 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -93,6 +93,9 @@
 #define MIN_CSGS   3
 #define MAX_CSG_PRIO   0xf
 
+#define NUM_INSTRS_PER_SLOT32
+#define SLOTSIZE   (NUM_INSTRS_PER_SLOT * 
sizeof(u64))
+
 struct panthor_group;
 
 /**
@@ -466,6 +469,9 @@ struct panthor_queue {
 */
struct list_head in_flight_jobs;
} fence_ctx;
+
+   /** @time_offset: Offset of panthor_job_times structs in group's 
syncobj bo. */
+   unsigned long time_offset;
 };
 
 /**
@@ -592,7 +598,17 @@ struct panthor_group {
 * One sync object per queue. The position of the sync object is
 * determined by the queue index.
 */
-   struct panthor_kernel_bo *syncobjs;
+
+   struct {
+   /** @bo: Kernel BO holding the sync objects. */
+   struct panthor_kernel_bo *bo;
+
+   /**
+* @job_times_offset: Beginning of panthor_job_times struct 
samples after
+* the group's array of sync objects.
+*/
+   size_t job_times_offset;
+   } syncobjs;
 
/** @state: Group state. */
enum panthor_group_state state;
@@ -651,6 +667,18 @@ struct panthor_group {
struct list_head wait_node;
 };
 
+struct panthor_job_times {
+   struct {
+   u64 before;
+   u64 after;
+   } cycles;
+
+   struct {
+   u64 before;
+   u64 after;
+   } time;
+};
+
 /**
  * group_queue_work() - Queue a group work
  * @group: Group to queue the work for.
@@ -730,6 +758,9 @@ struct panthor_job {
/** @queue_idx: Index of the queue inside @group. */
u32 queue_idx;
 
+   /** @ringbuf_idx: Index of the ringbuffer inside @queue. */
+   u32 ringbuf_idx;
+
/** @call_info: Information about the userspace command stream call. */
struct {
/** @start: GPU address of the userspace command stream. */
@@ -844,7 +875,7 @@ static void group_release_work(struct work_struct *work)
 
panthor_kernel_bo_destroy(group->suspend_buf);
panthor_kernel_bo_destroy(group->protm_suspend_buf);
-   panthor_kernel_bo_destroy(group->syncobjs);
+   panthor_kernel_bo_destroy(group->syncobjs.bo);
 
panthor_vm_put(group->vm);
kfree(group);
@@ -1969,8 +2000,6 @@ tick_ctx_init(struct panthor_scheduler *sched,
}
 }
 
-#define NUM_INSTRS_PER_SLOT16
-
 static void
 group_term_post_processing(struct panthor_group *group)
 {
@@ -2007,7 +2036,7 @@ group_term_post_processing(struct panthor_group *group)
spin_unlock(>fence_ctx.lock);
 
/* Manually update the syncobj seqno to unblock waiters. */
-   syncobj = group->syncobjs->kmap + (i * sizeof(*syncobj));
+   syncobj = group->syncobjs.bo->kmap + (i * sizeof(*syncobj));
syncobj->status = ~0;
syncobj->seqno = atomic64_read(>fence_ctx.seqno);
sched_queue_work(group->ptdev->scheduler, sync_upd);
@@ -2780,7 +2809,7 @@ static void group_sync_upd_work(struct work_struct *work)
if (!queue)
continue;
 
-   syncobj = group->syncobjs->kmap + (queue_idx * 
sizeof(*syncobj));
+   syncobj = group->syncobjs.bo->kmap + (queue_idx * 
sizeof(*syncobj));
 
spin_lock(>fence_ctx.lock);
list_for_each_entry_safe(job, job_tmp, 
>fence_ctx.in_flight_jobs, node) {
@@ -2815,11 +2844,17 @@ queue_run_job(struct drm_sched_job *sched_job)
struct panthor_scheduler *sched = ptde

[PATCH v3 4/7] drm/panthor: add sysfs knob for enabling job profiling

2024-06-05 Thread Adrián Larumbe

Just like it is already present in Panfrost, this commit introduces a DRM
device sysfs file that lets UM control the job accounting status in the
device.

The present commit only brings in the sysfs knob and also hides the cycles
and engine fdinfo tags when it's disabled, but leveraging it for job
accounting will be the matter of a later commit.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_device.h |  1 +
 drivers/gpu/drm/panthor/panthor_drv.c| 46 +---
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
b/drivers/gpu/drm/panthor/panthor_device.h
index 8a0260a7b90a..c3ec1e31f8b7 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -163,6 +163,7 @@ struct panthor_device {
struct page *dummy_latest_flush;
} pm;
 
+   bool profile_mode;
unsigned long current_frequency;
unsigned long fast_rate;
 };
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
b/drivers/gpu/drm/panthor/panthor_drv.c
index 6d25385e02a1..a2876310856f 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1360,12 +1360,14 @@ static void panthor_gpu_show_fdinfo(struct 
panthor_device *ptdev,
struct panthor_file *pfile,
struct drm_printer *p)
 {
+   if (ptdev->profile_mode) {
 #ifdef CONFIG_ARM_ARCH_TIMER
-   drm_printf(p, "drm-engine-panthor:\t%llu ns\n",
-  DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
-   arch_timer_get_cntfrq()));
+   drm_printf(p, "drm-engine-panthor:\t%llu ns\n",
+  DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
+   arch_timer_get_cntfrq()));
 #endif
-   drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles);
+   drm_printf(p, "drm-cycles-panthor:\t%llu\n", 
pfile->stats.cycles);
+   }
drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate);
drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", 
ptdev->current_frequency);
 }
@@ -1446,6 +1448,41 @@ static void panthor_remove(struct platform_device *pdev)
panthor_device_unplug(ptdev);
 }
 
+static ssize_t profiling_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   struct panthor_device *ptdev = dev_get_drvdata(dev);
+
+   return sysfs_emit(buf, "%d\n", ptdev->profile_mode);
+}
+
+static ssize_t profiling_store(struct device *dev,
+  struct device_attribute *attr,
+  const char *buf, size_t len)
+{
+   struct panthor_device *ptdev = dev_get_drvdata(dev);
+   bool value;
+   int err;
+
+   err = kstrtobool(buf, );
+   if (err)
+   return err;
+
+   ptdev->profile_mode = value;
+
+   return len;
+}
+
+static DEVICE_ATTR_RW(profiling);
+
+static struct attribute *panthor_attrs[] = {
+   _attr_profiling.attr,
+   NULL,
+};
+
+ATTRIBUTE_GROUPS(panthor);
+
 static const struct of_device_id dt_match[] = {
{ .compatible = "rockchip,rk3588-mali" },
{ .compatible = "arm,mali-valhall-csf" },
@@ -1465,6 +1502,7 @@ static struct platform_driver panthor_driver = {
.name = "panthor",
.pm = pm_ptr(_pm_ops),
.of_match_table = dt_match,
+   .dev_groups = panthor_groups,
},
 };
 
-- 
2.45.1

[PATCH v3 3/7] drm/panthor: enable fdinfo for memory stats

2024-06-05 Thread Adrián Larumbe

Implement drm object's status callback.

Also, we consider a PRIME imported BO to be resident if its matching
dma_buf has an open attachment, which means its backing storage had already
been allocated.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Liviu Dudau 
---
 drivers/gpu/drm/panthor/panthor_gem.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_gem.c 
b/drivers/gpu/drm/panthor/panthor_gem.c
index 38f560864879..c60b599665d8 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -145,6 +145,17 @@ panthor_gem_prime_export(struct drm_gem_object *obj, int 
flags)
return drm_gem_prime_export(obj, flags);
 }
 
+static enum drm_gem_object_status panthor_gem_status(struct drm_gem_object 
*obj)
+{
+   struct panthor_gem_object *bo = to_panthor_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   if (bo->base.base.import_attach || bo->base.pages)
+   res |= DRM_GEM_OBJECT_RESIDENT;
+
+   return res;
+}
+
 static const struct drm_gem_object_funcs panthor_gem_funcs = {
.free = panthor_gem_free_object,
.print_info = drm_gem_shmem_object_print_info,
@@ -154,6 +165,7 @@ static const struct drm_gem_object_funcs panthor_gem_funcs 
= {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = panthor_gem_mmap,
+   .status = panthor_gem_status,
.export = panthor_gem_prime_export,
.vm_ops = _gem_shmem_vm_ops,
 };
-- 
2.45.1

[PATCH v3 0/7] Support fdinfo runtime and memory stats on Panthor

2024-06-05 Thread Adrián Larumbe

This patch series enables userspace utilities like gputop and nvtop to
query a render context's fdinfo file and figure out rates of engine
and memory utilisation.

Previous discussion can be found at
https://lore.kernel.org/dri-devel/20240423213240.91412-1-adrian.laru...@collabora.com/

Changelog:
v3:
 - Fixed some nits and removed useless bounds check in panthor_sched.c
 - Added support for sysfs profiling knob and optional job accounting
 - Added new patches for calculating size of internal BO's
v2:
 - Split original first patch in two, one for FW CS cycle and timestamp
 calculations and job accounting memory management, and a second one
 that enables fdinfo.
 - Moved NUM_INSTRS_PER_SLOT to the file prelude
 - Removed nelem variable from the group's struct definition.
 - Precompute size of group's syncobj BO to avoid code duplication.
 - Some minor nits.


Adrián Larumbe (7):
  drm/panthor: introduce job cycle and timestamp accounting
  drm/panthor: add DRM fdinfo support
  drm/panthor: enable fdinfo for memory stats
  drm/panthor: add sysfs knob for enabling job profiling
  drm/panthor: support job accounting
  drm/drm_file: add display of driver's internal memory size
  drm/panthor: register size of internal objects through fdinfo

 Documentation/gpu/drm-usage-stats.rst |   4 +
 drivers/gpu/drm/drm_file.c|   9 +-
 drivers/gpu/drm/msm/msm_drv.c |   2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   |   2 +-
 drivers/gpu/drm/panthor/panthor_devfreq.c |  10 +
 drivers/gpu/drm/panthor/panthor_device.c  |   2 +
 drivers/gpu/drm/panthor/panthor_device.h  |  21 ++
 drivers/gpu/drm/panthor/panthor_drv.c |  83 +-
 drivers/gpu/drm/panthor/panthor_fw.c  |  16 +-
 drivers/gpu/drm/panthor/panthor_fw.h  |   5 +-
 drivers/gpu/drm/panthor/panthor_gem.c |  67 -
 drivers/gpu/drm/panthor/panthor_gem.h |  16 +-
 drivers/gpu/drm/panthor/panthor_heap.c|  23 +-
 drivers/gpu/drm/panthor/panthor_heap.h|   6 +-
 drivers/gpu/drm/panthor/panthor_mmu.c |   8 +-
 drivers/gpu/drm/panthor/panthor_mmu.h |   3 +-
 drivers/gpu/drm/panthor/panthor_sched.c   | 304 +++---
 include/drm/drm_file.h|   7 +-
 18 files changed, 522 insertions(+), 66 deletions(-)


base-commit: 310ec03841a36e3f45fb528f0dfdfe5b9e84b037
-- 
2.45.1

[PATCH v4 2/3] drm/lima: Fix dma_resv deadlock at drm object pin time

2024-05-23 Thread Adrián Larumbe

Commit a78027847226 ("drm/gem: Acquire reservation lock in
drm_gem_{pin/unpin}()") moved locking the DRM object's dma reservation to
drm_gem_pin(), but Lima's pin callback kept calling drm_gem_shmem_pin,
which also tries to lock the same dma_resv, leading to a double lock
situation.

As was already done for Panfrost in the previous commit, fix it by
replacing drm_gem_shmem_pin() with its locked variant.

Cc: Thomas Zimmermann 
Cc: Dmitry Osipenko 
Cc: Boris Brezillon 
Cc: Steven Price 
Fixes: a78027847226 ("drm/gem: Acquire reservation lock in 
drm_gem_{pin/unpin}()")
Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/lima/lima_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index 7ea244d876ca..9bb997dbb4b9 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -185,7 +185,7 @@ static int lima_gem_pin(struct drm_gem_object *obj)
if (bo->heap_size)
return -EINVAL;
 
-   return drm_gem_shmem_pin(>base);
+   return drm_gem_shmem_pin_locked(>base);
 }
 
 static int lima_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
-- 
2.45.1

[PATCH v4 3/3] drm/gem-shmem: Add import attachment warning to locked pin function

2024-05-23 Thread Adrián Larumbe

Commit ec144244a43f ("drm/gem-shmem: Acquire reservation lock in GEM
pin/unpin callbacks") moved locking DRM object's dma reservation to
drm_gem_shmem_object_pin, and made drm_gem_shmem_pin_locked public, so we
need to make sure the non-NULL check warning is also added to the latter.

Cc: Thomas Zimmermann 
Cc: Dmitry Osipenko 
Cc: Boris Brezillon 
Fixes: a78027847226 ("drm/gem: Acquire reservation lock in 
drm_gem_{pin/unpin}()")
Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 13bcdbfd..ad5d9f704e15 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -233,6 +233,8 @@ int drm_gem_shmem_pin_locked(struct drm_gem_shmem_object 
*shmem)
 
dma_resv_assert_held(shmem->base.resv);
 
+   drm_WARN_ON(shmem->base.dev, shmem->base.import_attach);
+
ret = drm_gem_shmem_get_pages(shmem);
 
return ret;
-- 
2.45.1

[PATCH v4 0/3] drm: Fix dma_resv deadlock at drm object pin time

2024-05-23 Thread Adrián Larumbe

This is v4 of 
https://lore.kernel.org/lkml/20240521181817.097af...@collabora.com/T/

The goal of this patch series is fixing a deadlock upon locking the dma 
reservation
of a DRM gem object when pinning it, at a prime import operation.

Changelog:
v3:
 - Split driver fixes into separate commits for Panfrost and Lima
 - Make drivers call drm_gem_shmem_pin_locked instead of 
drm_gem_shmem_object_pin
 - Improved commit message for first patch to explain why dma resv locking in 
the 
 pin callback is no longer necessary.
v2:
 - Removed comment explaining reason why an already-locked
pin function replaced the locked variant inside Panfrost's
object pin callback.
 - Moved already-assigned attachment warning into generic
already-locked gem object pin function


Adrián Larumbe (3):
  drm/panfrost: Fix dma_resv deadlock at drm object pin time
  drm/lima: Fix dma_resv deadlock at drm object pin time
  drm/gem-shmem: Add import attachment warning to locked pin function

 drivers/gpu/drm/drm_gem_shmem_helper.c  | 2 ++
 drivers/gpu/drm/lima/lima_gem.c | 2 +-
 drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)


base-commit: 7acacca1b157fcb258cfd781603425f73bc7370b
-- 
2.45.1

[PATCH v4 1/3] drm/panfrost: Fix dma_resv deadlock at drm object pin time

2024-05-23 Thread Adrián Larumbe

When Panfrost must pin an object that is being prepared a dma-buf
attachment for on behalf of another driver, the core drm gem object pinning
code already takes a lock on the object's dma reservation.

However, Panfrost GEM object's pinning callback would eventually try taking
the lock on the same dma reservation when delegating pinning of the object
onto the shmem subsystem, which led to a deadlock.

This can be shown by enabling CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which throws
the following recursive locking situation:

weston/3440 is trying to acquire lock:
00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
drm_gem_shmem_pin+0x34/0xb8 [drm_shmem_helper]
but task is already holding lock:
00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
drm_gem_pin+0x2c/0x80 [drm]

Fix it by replacing drm_gem_shmem_pin with its locked version, as the lock
had already been taken by drm_gem_pin().

Cc: Thomas Zimmermann 
Cc: Dmitry Osipenko 
Cc: Boris Brezillon 
Cc: Steven Price 
Fixes: a78027847226 ("drm/gem: Acquire reservation lock in 
drm_gem_{pin/unpin}()")
Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index d47b40b82b0b..8e0ff3efede7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -192,7 +192,7 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
if (bo->is_heap)
return -EINVAL;
 
-   return drm_gem_shmem_pin(>base);
+   return drm_gem_shmem_pin_locked(>base);
 }
 
 static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object 
*obj)
-- 
2.45.1

Re: [PATCH v3 0/2] drm: Fix dma_resv deadlock at drm object pin time

2024-05-17 Thread Adrián Larumbe

Hi Boris and Thomas,

On 02.05.2024 14:18, Thomas Zimmermann wrote:
> Hi
> 
> Am 02.05.24 um 14:00 schrieb Boris Brezillon:
> > On Thu, 2 May 2024 13:59:41 +0200
> > Boris Brezillon  wrote:
> > 
> > > Hi Thomas,
> > > 
> > > On Thu, 2 May 2024 13:51:16 +0200
> > > Thomas Zimmermann  wrote:
> > > 
> > > > Hi,
> > > > 
> > > > ignoring my r-b on patch 1, I'd like to rethink the current patches in
> > > > general.
> > > > 
> > > > I think drm_gem_shmem_pin() should become the locked version of _pin(),
> > > > so that drm_gem_shmem_object_pin() can call it directly. The existing
> > > > _pin_unlocked() would not be needed any longer. Same for the _unpin()
> > > > functions. This change would also fix the consistency with the semantics
> > > > of the shmem _vmap() functions, which never take reservation locks.
> > > > 
> > > > There are only two external callers of drm_gem_shmem_pin(): the test
> > > > case and panthor. These assume that drm_gem_shmem_pin() acquires the
> > > > reservation lock. The test case should likely call drm_gem_pin()
> > > > instead. That would acquire the reservation lock and the test would
> > > > validate that shmem's pin helper integrates well into the overall GEM
> > > > framework. The way panthor uses drm_gem_shmem_pin() looks wrong to me.
> > > > For now, it could receive a wrapper that takes the lock and that's it.
> > > I do agree that the current inconsistencies in the naming is
> > > troublesome (sometimes _unlocked, sometimes _locked, with the version
> > > without any suffix meaning either _locked or _unlocked depending on
> > > what the suffixed version does), and that's the very reason I asked
> > > Dmitry to address that in his shrinker series [1]. So, ideally I'd
> > > prefer if patches from Dmitry's series were applied instead of
> > > trying to fix that here (IIRC, we had an ack from Maxime).
> > With the link this time :-).
> > 
> > [1]https://lore.kernel.org/lkml/20240105184624.508603-1-dmitry.osipe...@collabora.com/T/
> 
> Thanks. I remember these patches. Somehow I thought they would have been
> merged already. I wasn't super happy about the naming changes in patch 5,
> because the names of the GEM object callbacks do no longer correspond with
> their implementations. But anyway.
> 
> If we go that direction, we should here simply push drm_gem_shmem_pin() and
> drm_gem_shmem_unpin() into panthor and update the shmem tests with
> drm_gem_pin(). Panfrost and lima would call drm_gem_shmem_pin_locked(). IMHO
> we should not promote the use of drm_gem_shmem_object_*() functions, as they
> are meant to be callbacks for struct drm_gem_object_funcs. (Auto-generating
> them would be nice.)

I'll be doing this in the next patch series iteration, casting the pin 
function's
drm object parameter to an shmem object.

Also for the sake of leaving things in a consistent state, and against Boris' 
advice,
I think I'll leave the drm WARN statement inside drm_gem_shmem_pin_locked. I 
guess
even though Dmitry's working on it, rebasing his work on top of this minor 
change
shouldn't be an issue.

Cheers,
Adrian Larumbe

> Best regards
> Thomas
> 
> 
> > 
> > > Regards,
> > > 
> > > Boris

Re: [PATCH v2 3/3] drm/panthor: Enable fdinfo for memory stats

2024-05-16 Thread Adrián Larumbe

On 24.04.2024 18:34, Liviu Dudau wrote:
> Hello,
> 
> On Tue, Apr 23, 2024 at 10:32:36PM +0100, Adrián Larumbe wrote:
> > When vm-binding an already-created BO, the entirety of its virtual size is
> > then backed by system memory, so its RSS is always the same as its virtual
> > size.
> 
> How is that relevant to this patch? Or to put it differently: how are your
> words describing your code change here?

I think I wrote this as a throw-back to the time when we handled RSS 
calculations for
Panfrost objects, because heap BO's would be mapped on demand at every page 
fault.

I understand that without mention of this the remark seems out of context, so
depending on your taste I can either expand the message to underline this, or 
perhaps
drop it altogether.

I think I'd rather go for the latter, since the fact that panthor_gem_funcs 
includes
no binding for drm_gem_object_funcs::rss() should be enough of a hint at this.  

> > 
> > Also, we consider a PRIME imported BO to be resident if its matching
> > dma_buf has an open attachment, which means its backing storage had already
> > been allocated.
> 
> Reviewed-by: Liviu Dudau 
> 
> Best regards,
> Liviu
> 
> > 
> > Signed-off-by: Adrián Larumbe 
> > ---
> >  drivers/gpu/drm/panthor/panthor_gem.c | 12 
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_gem.c 
> > b/drivers/gpu/drm/panthor/panthor_gem.c
> > index d6483266d0c2..386c0dfeeb5f 100644
> > --- a/drivers/gpu/drm/panthor/panthor_gem.c
> > +++ b/drivers/gpu/drm/panthor/panthor_gem.c
> > @@ -143,6 +143,17 @@ panthor_gem_prime_export(struct drm_gem_object *obj, 
> > int flags)
> > return drm_gem_prime_export(obj, flags);
> >  }
> >  
> > +static enum drm_gem_object_status panthor_gem_status(struct drm_gem_object 
> > *obj)
> > +{
> > +   struct panthor_gem_object *bo = to_panthor_bo(obj);
> > +   enum drm_gem_object_status res = 0;
> > +
> > +   if (bo->base.base.import_attach || bo->base.pages)
> > +   res |= DRM_GEM_OBJECT_RESIDENT;
> > +
> > +   return res;
> > +}
> > +
> >  static const struct drm_gem_object_funcs panthor_gem_funcs = {
> > .free = panthor_gem_free_object,
> > .print_info = drm_gem_shmem_object_print_info,
> > @@ -152,6 +163,7 @@ static const struct drm_gem_object_funcs 
> > panthor_gem_funcs = {
> > .vmap = drm_gem_shmem_object_vmap,
> > .vunmap = drm_gem_shmem_object_vunmap,
> > .mmap = panthor_gem_mmap,
> > +   .status = panthor_gem_status,
> > .export = panthor_gem_prime_export,
> > .vm_ops = _gem_shmem_vm_ops,
> >  };
> > -- 
> > 2.44.0
> > 

Adrian Larumbe

Re: [PATCH v2 1/1] drm: Add ioctl for querying a DRM device's list of open client PIDs

2024-05-16 Thread Adrián Larumbe

Hi Daniel,

On 02.05.2024 10:09, Daniel Vetter wrote:
> On Wed, May 01, 2024 at 07:50:43PM +0100, Adrián Larumbe wrote:
> > Up to this day, all fdinfo-based GPU profilers must traverse the entire
> > /proc directory structure to find open DRM clients with fdinfo file
> > descriptors. This is inefficient and time-consuming.
> > 
> > This patch adds a new DRM ioctl that allows users to obtain a list of PIDs
> > for clients who have opened the DRM device. Output from the ioctl isn't
> > human-readable, and it's meant to be retrieved only by GPU profilers like
> > gputop and nvtop.
> > 
> > Cc: Rob Clark 
> > Cc: Tvrtko Ursulin 
> > Signed-off-by: Adrián Larumbe 
> > ---
> >  drivers/gpu/drm/drm_internal.h |  1 +
> >  drivers/gpu/drm/drm_ioctl.c| 89 ++
> >  include/uapi/drm/drm.h |  7 +++
> >  3 files changed, 97 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> > index 690505a1f7a5..6f78954cae16 100644
> > --- a/drivers/gpu/drm/drm_internal.h
> > +++ b/drivers/gpu/drm/drm_internal.h
> > @@ -243,6 +243,7 @@ static inline void drm_debugfs_encoder_remove(struct 
> > drm_encoder *encoder)
> >  drm_ioctl_t drm_version;
> >  drm_ioctl_t drm_getunique;
> >  drm_ioctl_t drm_getclient;
> > +drm_ioctl_t drm_getclients;
> >  
> >  /* drm_syncobj.c */
> >  void drm_syncobj_open(struct drm_file *file_private);
> > diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> > index e368fc084c77..da7057376581 100644
> > --- a/drivers/gpu/drm/drm_ioctl.c
> > +++ b/drivers/gpu/drm/drm_ioctl.c
> > @@ -207,6 +207,93 @@ int drm_getclient(struct drm_device *dev, void *data,
> > }
> >  }
> >  
> > +/*
> > + * Get list of client PIDs who have opened a DRM file
> > + *
> > + * \param dev DRM device we are querying
> > + * \param data IOCTL command input.
> > + * \param file_priv DRM file private.
> > + *
> > + * \return zero on success or a negative number on failure.
> > + *
> > + * Traverses list of open clients for the given DRM device, and
> > + * copies them into userpace as an array of PIDs
> > + */
> > +int drm_getclients(struct drm_device *dev, void *data,
> > +  struct drm_file *file_priv)
> > +
> > +{
> > +   struct drm_get_clients *get_clients = data;
> > +   ssize_t size = get_clients->len;
> > +   char __user *pid_buf;
> > +   ssize_t offset = 0;
> > +   int ret = 0;
> > +
> > +   /*
> > +* We do not want to show clients of display only devices so
> > +* as to avoid confusing UM GPU profilers
> > +*/
> > +   if (!dev->render) {
> > +   get_clients->len = 0;
> > +   return 0;
> > +   }
> > +
> > +   /*
> > +* An input size of zero means UM wants to know the size of the PID 
> > buffer
> > +* We round it up to the nearest multiple of the page size so that we 
> > can have
> > +* some spare headroom in case more clients came in between successive 
> > calls
> > +* of this ioctl, and also to simplify parsing of the PIDs buffer, 
> > because
> > +* sizeof(pid_t) will hopefully always divide PAGE_SIZE
> > +*/
> > +   if (size == 0) {
> > +   get_clients->len =
> > +   roundup(atomic_read(>open_count) * sizeof(pid_t), 
> > PAGE_SIZE);
> > +   return 0;
> > +   }
> > +
> > +   pid_buf = (char *)(void *)get_clients->user_data;
> > +
> > +   if (!pid_buf)
> > +   return -EINVAL;
> > +
> > +   mutex_lock(>filelist_mutex);
> > +   list_for_each_entry_reverse(file_priv, >filelist, lhead) {
> > +   pid_t pid_num;
> > +
> > +   if ((size - offset) < sizeof(pid_t))
> > +   break;
> > +
> > +   rcu_read_lock();
> > +   pid_num = pid_vnr(rcu_dereference(file_priv->pid));
> > +   rcu_read_unlock();
> > +
> > +   /* We do not want to return the profiler's PID */
> > +   if (pid_vnr(task_tgid(current)) == pid_num)
> > +   continue;
> > +
> > +   ret = copy_to_user(pid_buf + offset, _num, sizeof(pid_t));
> > +   if (ret)
> > +   break;
> > +
> > +   offset += sizeof(pid_t);
> > +   }
> > +   mutex_unlock(>filelist_mutex);
> > +
> > +   if (ret)
> > +   retu

[PATCH v2 1/1] drm: Add ioctl for querying a DRM device's list of open client PIDs

2024-05-01 Thread Adrián Larumbe

Up to this day, all fdinfo-based GPU profilers must traverse the entire
/proc directory structure to find open DRM clients with fdinfo file
descriptors. This is inefficient and time-consuming.

This patch adds a new DRM ioctl that allows users to obtain a list of PIDs
for clients who have opened the DRM device. Output from the ioctl isn't
human-readable, and it's meant to be retrieved only by GPU profilers like
gputop and nvtop.

Cc: Rob Clark 
Cc: Tvrtko Ursulin 
Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/drm_internal.h |  1 +
 drivers/gpu/drm/drm_ioctl.c| 89 ++
 include/uapi/drm/drm.h |  7 +++
 3 files changed, 97 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 690505a1f7a5..6f78954cae16 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -243,6 +243,7 @@ static inline void drm_debugfs_encoder_remove(struct 
drm_encoder *encoder)
 drm_ioctl_t drm_version;
 drm_ioctl_t drm_getunique;
 drm_ioctl_t drm_getclient;
+drm_ioctl_t drm_getclients;
 
 /* drm_syncobj.c */
 void drm_syncobj_open(struct drm_file *file_private);
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index e368fc084c77..da7057376581 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -207,6 +207,93 @@ int drm_getclient(struct drm_device *dev, void *data,
}
 }
 
+/*
+ * Get list of client PIDs who have opened a DRM file
+ *
+ * \param dev DRM device we are querying
+ * \param data IOCTL command input.
+ * \param file_priv DRM file private.
+ *
+ * \return zero on success or a negative number on failure.
+ *
+ * Traverses list of open clients for the given DRM device, and
+ * copies them into userpace as an array of PIDs
+ */
+int drm_getclients(struct drm_device *dev, void *data,
+  struct drm_file *file_priv)
+
+{
+   struct drm_get_clients *get_clients = data;
+   ssize_t size = get_clients->len;
+   char __user *pid_buf;
+   ssize_t offset = 0;
+   int ret = 0;
+
+   /*
+* We do not want to show clients of display only devices so
+* as to avoid confusing UM GPU profilers
+*/
+   if (!dev->render) {
+   get_clients->len = 0;
+   return 0;
+   }
+
+   /*
+* An input size of zero means UM wants to know the size of the PID 
buffer
+* We round it up to the nearest multiple of the page size so that we 
can have
+* some spare headroom in case more clients came in between successive 
calls
+* of this ioctl, and also to simplify parsing of the PIDs buffer, 
because
+* sizeof(pid_t) will hopefully always divide PAGE_SIZE
+*/
+   if (size == 0) {
+   get_clients->len =
+   roundup(atomic_read(>open_count) * sizeof(pid_t), 
PAGE_SIZE);
+   return 0;
+   }
+
+   pid_buf = (char *)(void *)get_clients->user_data;
+
+   if (!pid_buf)
+   return -EINVAL;
+
+   mutex_lock(>filelist_mutex);
+   list_for_each_entry_reverse(file_priv, >filelist, lhead) {
+   pid_t pid_num;
+
+   if ((size - offset) < sizeof(pid_t))
+   break;
+
+   rcu_read_lock();
+   pid_num = pid_vnr(rcu_dereference(file_priv->pid));
+   rcu_read_unlock();
+
+   /* We do not want to return the profiler's PID */
+   if (pid_vnr(task_tgid(current)) == pid_num)
+   continue;
+
+   ret = copy_to_user(pid_buf + offset, _num, sizeof(pid_t));
+   if (ret)
+   break;
+
+   offset += sizeof(pid_t);
+   }
+   mutex_unlock(>filelist_mutex);
+
+   if (ret)
+   return -EFAULT;
+
+   if ((size - offset) >= sizeof(pid_t)) {
+   pid_t pid_zero = 0;
+
+   ret = copy_to_user(pid_buf + offset,
+  _zero, sizeof(pid_t));
+   if (ret)
+   return -EFAULT;
+   }
+
+   return 0;
+}
+
 /*
  * Get statistics information.
  *
@@ -672,6 +759,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_MODE_LIST_LESSEES, drm_mode_list_lessees_ioctl, 
DRM_MASTER),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_GET_LEASE, drm_mode_get_lease_ioctl, 
DRM_MASTER),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_REVOKE_LEASE, drm_mode_revoke_lease_ioctl, 
DRM_MASTER),
+
+   DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENTS, drm_getclients, DRM_RENDER_ALLOW),
 };
 
 #define DRM_CORE_IOCTL_COUNT   ARRAY_SIZE(drm_ioctls)
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 16122819edfe..c47aa9de51ab 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -1024,6 +1024,11 @@ struct drm_crtc_queue_sequence {
__u64 user_data;/* user d

[PATCH v2 0/1] drm: Add ioctl for querying a DRM device's list of open client PIDs

2024-05-01 Thread Adrián Larumbe

This is v2 of the patch being discussed at
https://lore.kernel.org/dri-devel/20240403182951.724488-1-adrian.laru...@collabora.com/

In the original patch, a DRM device sysfs attribute file was chosen as the
interface for fetching the list of active client PIDs.

That came with a hosts of problems:
 - Normal device attributes can only send back up to a page worth of data, 
which might
 be not enough if many clients are opening the DRM device.
 - The binary attribute interface is meant for immutable virtual files, but the 
list of
 active PIDs can grow and shrink between successive calls of show().

This led me to believe sysfs is not the right tool for the job, so switched over
to a custom DRM ioctl that does the same thing.

In order to test this patch, one can use WIP branches for both libdrm and igt 
at:

https://gitlab.freedesktop.org/larumbe/igt-gpu-tools/-/tree/drm-clients-ioctl?ref_type=heads
https://gitlab.freedesktop.org/larumbe/drm/-/tree/drm-clients-ioctl?ref_type=heads

I've only tested it with gputop, but intel-gputop should work also.

Adrián Larumbe (1):
  drm: Add ioctl for querying a DRM device's list of open client PIDs

 drivers/gpu/drm/drm_internal.h |  1 +
 drivers/gpu/drm/drm_ioctl.c| 89 ++
 include/uapi/drm/drm.h |  7 +++
 3 files changed, 97 insertions(+)

-- 
2.44.0

[PATCH v3 2/2] drm/gem-shmem: Add import attachment warning to locked pin function

2024-05-01 Thread Adrián Larumbe

Commit ec144244a43f ("drm/gem-shmem: Acquire reservation lock in GEM
pin/unpin callbacks") moved locking DRM object's dma reservation to
drm_gem_shmem_object_pin, and made drm_gem_shmem_pin_locked public, so we
need to make sure the non-NULL check warning is also added to the latter.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/drm_gem_shmem_helper.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 13bcdbfd..ad5d9f704e15 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -233,6 +233,8 @@ int drm_gem_shmem_pin_locked(struct drm_gem_shmem_object 
*shmem)
 
dma_resv_assert_held(shmem->base.resv);
 
+   drm_WARN_ON(shmem->base.dev, shmem->base.import_attach);
+
ret = drm_gem_shmem_get_pages(shmem);
 
return ret;
-- 
2.44.0

[PATCH v3 0/2] drm: Fix dma_resv deadlock at drm object pin time

2024-05-01 Thread Adrián Larumbe

This is v3 of 
https://lore.kernel.org/dri-devel/20240424090429.57de7...@collabora.com/

The goal of this patch series is fixing a deadlock upon locking the dma 
reservation
of a DRM gem object when pinning it, at a prime import operation.

Changes from v2:
 - Removed comment explaining reason why an already-locked
pin function replaced the locked variant inside Panfrost's
object pin callback.
 - Moved already-assigned attachment warning into generic
already-locked gem object pin function

Adrián Larumbe (2):
  drm/panfrost: Fix dma_resv deadlock at drm object pin time
  drm/gem-shmem: Add import attachment warning to locked pin function

 drivers/gpu/drm/drm_gem_shmem_helper.c  | 2 ++
 drivers/gpu/drm/lima/lima_gem.c | 2 +-
 drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)


base-commit: 75b68f22e39aafb22f3d8e3071e1aba73560788c
-- 
2.44.0

[PATCH v3 1/2] drm/panfrost: Fix dma_resv deadlock at drm object pin time

2024-05-01 Thread Adrián Larumbe

When Panfrost must pin an object that is being prepared a dma-buf
attachment for on behalf of another driver, the core drm gem object pinning
code already takes a lock on the object's dma reservation.

However, Panfrost GEM object's pinning callback would eventually try taking
the lock on the same dma reservation when delegating pinning of the object
onto the shmem subsystem, which led to a deadlock.

This can be shown by enabling CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which throws
the following recursive locking situation:

weston/3440 is trying to acquire lock:
00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
drm_gem_shmem_pin+0x34/0xb8 [drm_shmem_helper]
but task is already holding lock:
00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
drm_gem_pin+0x2c/0x80 [drm]

Fix it by assuming the object's reservation had already been locked by the
time we reach panfrost_gem_pin.

Do the same thing for the Lima driver, as it most likely suffers from the
same issue.

Cc: Thomas Zimmermann 
Cc: Dmitry Osipenko 
Cc: Boris Brezillon 
Cc: Steven Price 
Fixes: a78027847226 ("drm/gem: Acquire reservation lock in 
drm_gem_{pin/unpin}()")
Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/lima/lima_gem.c | 2 +-
 drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index 7ea244d876ca..c4e0f9faaa47 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -185,7 +185,7 @@ static int lima_gem_pin(struct drm_gem_object *obj)
if (bo->heap_size)
return -EINVAL;
 
-   return drm_gem_shmem_pin(>base);
+   return drm_gem_shmem_object_pin(obj);
 }
 
 static int lima_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index d47b40b82b0b..f268bd5c2884 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -192,7 +192,7 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
if (bo->is_heap)
return -EINVAL;
 
-   return drm_gem_shmem_pin(>base);
+   return drm_gem_shmem_object_pin(obj);
 }
 
 static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object 
*obj)
-- 
2.44.0

Re: [PATCH v2 3/4] drm/panthor: Relax the constraints on the tiler chunk size

2024-04-30 Thread Adrián Larumbe

Hi Boris,

On 30.04.2024 13:28, Boris Brezillon wrote:
> The field used to store the chunk size if 12 bits wide, and the encoding
> is chunk_size = chunk_header.chunk_size << 12, which gives us a
> theoretical [4k:8M] range. This range is further limited by
> implementation constraints, and all known implementations seem to
> impose a [128k:8M] range, so do the same here.
> 
> We also relax the power-of-two constraint, which doesn't seem to
> exist on v10. This will allow userspace to fine-tune initial/max
> tiler memory on memory-constrained devices.
> 
> v2:
> - Turn the power-of-two constraint into a page-aligned constraint to allow
>   fine-tune of the initial/max heap memory size
> - Fix the panthor_heap_create() kerneldoc
> 
> Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block")
> Signed-off-by: Boris Brezillon 
> ---
>  drivers/gpu/drm/panthor/panthor_heap.c | 8 
>  include/uapi/drm/panthor_drm.h | 6 +-
>  2 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_heap.c 
> b/drivers/gpu/drm/panthor/panthor_heap.c
> index 3be86ec383d6..683bb94761bc 100644
> --- a/drivers/gpu/drm/panthor/panthor_heap.c
> +++ b/drivers/gpu/drm/panthor/panthor_heap.c
> @@ -253,8 +253,8 @@ int panthor_heap_destroy(struct panthor_heap_pool *pool, 
> u32 handle)
>   * @pool: Pool to instantiate the heap context from.
>   * @initial_chunk_count: Number of chunk allocated at initialization time.
>   * Must be at least 1.
> - * @chunk_size: The size of each chunk. Must be a power of two between 256k
> - * and 2M.
> + * @chunk_size: The size of each chunk. Must be page-aligned and lie in the
> + * [128k:2M] range.

Probably a typo, but I guess this should be [128k:8M] ?

>   * @max_chunks: Maximum number of chunks that can be allocated.
>   * @target_in_flight: Maximum number of in-flight render passes.
>   * @heap_ctx_gpu_va: Pointer holding the GPU address of the allocated heap
> @@ -284,8 +284,8 @@ int panthor_heap_create(struct panthor_heap_pool *pool,
>   if (initial_chunk_count > max_chunks)
>   return -EINVAL;
>  
> - if (hweight32(chunk_size) != 1 ||
> - chunk_size < SZ_256K || chunk_size > SZ_2M)
> + if (!IS_ALIGNED(chunk_size, PAGE_SIZE) ||
> + chunk_size < SZ_128K || chunk_size > SZ_8M)
>   return -EINVAL;
>  
>   down_read(>lock);
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index 5db80a0682d5..b8220d2e698f 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -898,7 +898,11 @@ struct drm_panthor_tiler_heap_create {
>   /** @initial_chunk_count: Initial number of chunks to allocate. Must be 
> at least one. */
>   __u32 initial_chunk_count;
>  
> - /** @chunk_size: Chunk size. Must be a power of two at least 256KB 
> large. */
> + /**
> +  * @chunk_size: Chunk size.
> +  *
> +  * Must be page-aligned and lie in the [128k:8M] range.
> +  */
>   __u32 chunk_size;
>  
>   /**
> -- 
> 2.44.0


Adrian Larumbe

Re: [PATCH] drm/sysfs: Add drm class-wide attribute to get active device clients

2024-04-24 Thread Adrián Larumbe

Hi Tvrtko,

On 15.04.2024 13:50, Tvrtko Ursulin wrote:
> 
> On 05/04/2024 18:59, Rob Clark wrote:
> > On Wed, Apr 3, 2024 at 11:37 AM Adrián Larumbe
> >  wrote:
> > > 
> > > Up to this day, all fdinfo-based GPU profilers must traverse the entire
> > > /proc directory structure to find open DRM clients with fdinfo file
> > > descriptors. This is inefficient and time-consuming.
> > > 
> > > This patch adds a new device class attribute that will install a sysfs 
> > > file
> > > per DRM device, which can be queried by profilers to get a list of PIDs 
> > > for
> > > their open clients. This file isn't human-readable, and it's meant to be
> > > queried only by GPU profilers like gputop and nvtop.
> > > 
> > > Cc: Boris Brezillon 
> > > Cc: Tvrtko Ursulin 
> > > Cc: Christopher Healy 
> > > Signed-off-by: Adrián Larumbe 
> > 
> > It does seem like a good idea.. idk if there is some precedent to
> > prefer binary vs ascii in sysfs, but having a way to avoid walking
> > _all_ processes is a good idea.
> 
> I naturally second that it is a needed feature, but I do not think binary
> format is justified. AFAIR it should be used for things like hw/fw
> standardised tables or firmware images, not when exporting a simple list of
> PIDs. It also precludes easy shell/script access and the benefit of avoiding
> parsing a short list is I suspect completely dwarfed by needing to parse all
> the related fdinfo etc.

I'd rather keep it as a binary file for the sake of easily parsing the number
list on the client side, in gputop or nvtop. For textual access, there's already
a debugfs file that presents the same information, so I thought it was best not
to duplicate that functionality and restrict sysfs to serving the very specific
use case of UM profilers having to access the DRM client list.

I should mention I did something controversial here, which is a semantically
binary attribute through the regular attribute interface. I guess if I keep it
as a binary attribute in the end, I should switch over to the binary attribute
API.

Another reason why I implemented it as a binary file is that we can only send
back at most a whole page. If a PID takes 4 bytes, that's usually 1024 clients
at most, which is probably enough for any UM profiler, but will decrease even
more if we turn it into an ASCII readable file.

I did some research into sysfs binary attributes, and while some sources 
mention that
it's often used for dumping or loading of driver FW, none of them claim it 
cannot
be used for other purposes.

> > > ---
> > >   drivers/gpu/drm/drm_internal.h   |  2 +-
> > >   drivers/gpu/drm/drm_privacy_screen.c |  2 +-
> > >   drivers/gpu/drm/drm_sysfs.c  | 89 ++--
> > >   3 files changed, 74 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_internal.h 
> > > b/drivers/gpu/drm/drm_internal.h
> > > index 2215baef9a3e..9a399b03d11c 100644
> > > --- a/drivers/gpu/drm/drm_internal.h
> > > +++ b/drivers/gpu/drm/drm_internal.h
> > > @@ -145,7 +145,7 @@ bool drm_master_internal_acquire(struct drm_device 
> > > *dev);
> > >   void drm_master_internal_release(struct drm_device *dev);
> > > 
> > >   /* drm_sysfs.c */
> > > -extern struct class *drm_class;
> > > +extern struct class drm_class;
> > > 
> > >   int drm_sysfs_init(void);
> > >   void drm_sysfs_destroy(void);
> > > diff --git a/drivers/gpu/drm/drm_privacy_screen.c 
> > > b/drivers/gpu/drm/drm_privacy_screen.c
> > > index 6cc39e30781f..2fbd24ba5818 100644
> > > --- a/drivers/gpu/drm/drm_privacy_screen.c
> > > +++ b/drivers/gpu/drm/drm_privacy_screen.c
> > > @@ -401,7 +401,7 @@ struct drm_privacy_screen 
> > > *drm_privacy_screen_register(
> > >  mutex_init(>lock);
> > >  BLOCKING_INIT_NOTIFIER_HEAD(>notifier_head);
> > > 
> > > -   priv->dev.class = drm_class;
> > > +   priv->dev.class = _class;
> > >  priv->dev.type = _privacy_screen_type;
> > >  priv->dev.parent = parent;
> > >  priv->dev.release = drm_privacy_screen_device_release;
> > > diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
> > > index a953f69a34b6..56ca9e22c720 100644
> > > --- a/drivers/gpu/drm/drm_sysfs.c
> > > +++ b/drivers/gpu/drm/drm_sysfs.c
> > > @@ -58,8 +58,6 @@ static struct device_type drm_sysfs_device_connector = {
> > >  .name = "drm_connector",
> > >

[PATCH v2] drm/panfrost: Fix dma_resv deadlock at drm object pin time

2024-04-23 Thread Adrián Larumbe

When Panfrost must pin an object that is being prepared a dma-buf
attachment for on behalf of another driver, the core drm gem object pinning
code already takes a lock on the object's dma reservation.

However, Panfrost GEM object's pinning callback would eventually try taking
the lock on the same dma reservation when delegating pinning of the object
onto the shmem subsystem, which led to a deadlock.

This can be shown by enabling CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which throws
the following recursive locking situation:

weston/3440 is trying to acquire lock:
00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
drm_gem_shmem_pin+0x34/0xb8 [drm_shmem_helper]
but task is already holding lock:
00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
drm_gem_pin+0x2c/0x80 [drm]

Fix it by assuming the object's reservation had already been locked by the
time we reach panfrost_gem_pin.

Do the same thing for the Lima driver, as it most likely suffers from the
same issue.

Cc: Thomas Zimmermann 
Cc: Dmitry Osipenko 
Cc: Boris Brezillon 
Cc: Steven Price 
Fixes: a78027847226 ("drm/gem: Acquire reservation lock in 
drm_gem_{pin/unpin}()")
Reviewed-by: Boris Brezillon 
Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/lima/lima_gem.c | 9 +++--
 drivers/gpu/drm/panfrost/panfrost_gem.c | 8 +++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index 7ea244d876ca..8a5bcf498ef6 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -184,8 +184,13 @@ static int lima_gem_pin(struct drm_gem_object *obj)
 
if (bo->heap_size)
return -EINVAL;
-
-   return drm_gem_shmem_pin(>base);
+   /*
+* Pinning can only happen in response to a prime attachment request
+* from another driver, but dma reservation locking is already being
+* handled by drm_gem_pin
+*/
+   drm_WARN_ON(obj->dev, obj->import_attach);
+   return drm_gem_shmem_object_pin(obj);
 }
 
 static int lima_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index d47b40b82b0b..e3fbcb020617 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -192,7 +192,13 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
if (bo->is_heap)
return -EINVAL;
 
-   return drm_gem_shmem_pin(>base);
+   /*
+* Pinning can only happen in response to a prime attachment request
+* from another driver, but dma reservation locking is already being
+* handled by drm_gem_pin
+*/
+   drm_WARN_ON(obj->dev, obj->import_attach);
+   return drm_gem_shmem_object_pin(obj);
 }
 
 static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object 
*obj)
-- 
2.44.0

[PATCH v2 3/3] drm/panthor: Enable fdinfo for memory stats

2024-04-23 Thread Adrián Larumbe

When vm-binding an already-created BO, the entirety of its virtual size is
then backed by system memory, so its RSS is always the same as its virtual
size.

Also, we consider a PRIME imported BO to be resident if its matching
dma_buf has an open attachment, which means its backing storage had already
been allocated.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_gem.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_gem.c 
b/drivers/gpu/drm/panthor/panthor_gem.c
index d6483266d0c2..386c0dfeeb5f 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -143,6 +143,17 @@ panthor_gem_prime_export(struct drm_gem_object *obj, int 
flags)
return drm_gem_prime_export(obj, flags);
 }
 
+static enum drm_gem_object_status panthor_gem_status(struct drm_gem_object 
*obj)
+{
+   struct panthor_gem_object *bo = to_panthor_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   if (bo->base.base.import_attach || bo->base.pages)
+   res |= DRM_GEM_OBJECT_RESIDENT;
+
+   return res;
+}
+
 static const struct drm_gem_object_funcs panthor_gem_funcs = {
.free = panthor_gem_free_object,
.print_info = drm_gem_shmem_object_print_info,
@@ -152,6 +163,7 @@ static const struct drm_gem_object_funcs panthor_gem_funcs 
= {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = panthor_gem_mmap,
+   .status = panthor_gem_status,
.export = panthor_gem_prime_export,
.vm_ops = _gem_shmem_vm_ops,
 };
-- 
2.44.0

[PATCH v2 2/3] drm/panthor: Add DRM fdinfo support

2024-04-23 Thread Adrián Larumbe

Drawing from the FW-calculated values in the previous commit, we can
increase the numbers for an open file by collecting them from finished jobs
when updating their group synchronisation objects.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_devfreq.c | 10 +
 drivers/gpu/drm/panthor/panthor_device.h  | 11 ++
 drivers/gpu/drm/panthor/panthor_drv.c | 31 +++
 drivers/gpu/drm/panthor/panthor_sched.c   | 46 +++
 4 files changed, 98 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c 
b/drivers/gpu/drm/panthor/panthor_devfreq.c
index c6d3c327cc24..5eededaeade7 100644
--- a/drivers/gpu/drm/panthor/panthor_devfreq.c
+++ b/drivers/gpu/drm/panthor/panthor_devfreq.c
@@ -91,6 +91,7 @@ static int panthor_devfreq_get_dev_status(struct device *dev,
spin_lock_irqsave(>lock, irqflags);
 
panthor_devfreq_update_utilization(pdevfreq);
+   ptdev->current_frequency = status->current_frequency;
 
status->total_time = ktime_to_ns(ktime_add(pdevfreq->busy_time,
   pdevfreq->idle_time));
@@ -130,6 +131,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
struct panthor_devfreq *pdevfreq;
struct dev_pm_opp *opp;
unsigned long cur_freq;
+   unsigned long freq = ULONG_MAX;
int ret;
 
pdevfreq = drmm_kzalloc(>base, sizeof(*ptdev->devfreq), 
GFP_KERNEL);
@@ -204,6 +206,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
 
dev_pm_opp_put(opp);
 
+   /* Find the fastest defined rate  */
+   opp = dev_pm_opp_find_freq_floor(dev, );
+   if (IS_ERR(opp))
+   return PTR_ERR(opp);
+   ptdev->fast_rate = freq;
+
+   dev_pm_opp_put(opp);
+
/*
 * Setup default thresholds for the simple_ondemand governor.
 * The values are chosen based on experiments.
diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
b/drivers/gpu/drm/panthor/panthor_device.h
index 2fdd671b38fd..b5b5dfe3cafe 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -162,6 +162,14 @@ struct panthor_device {
 */
struct page *dummy_latest_flush;
} pm;
+
+   unsigned long current_frequency;
+   unsigned long fast_rate;
+};
+
+struct panthor_gpu_usage {
+   u64 time;
+   u64 cycles;
 };
 
 /**
@@ -176,6 +184,9 @@ struct panthor_file {
 
/** @groups: Scheduling group pool attached to this file. */
struct panthor_group_pool *groups;
+
+   /** @stats: cycle and timestamp measures for job execution. */
+   struct panthor_gpu_usage stats;
 };
 
 int panthor_device_init(struct panthor_device *ptdev);
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
b/drivers/gpu/drm/panthor/panthor_drv.c
index b8a84f26b3ef..6d25385e02a1 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -3,12 +3,17 @@
 /* Copyright 2019 Linaro, Ltd., Rob Herring  */
 /* Copyright 2019 Collabora ltd. */
 
+#ifdef CONFIG_ARM_ARCH_TIMER
+#include 
+#endif
+
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1351,6 +1356,30 @@ static int panthor_mmap(struct file *filp, struct 
vm_area_struct *vma)
return ret;
 }
 
+static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev,
+   struct panthor_file *pfile,
+   struct drm_printer *p)
+{
+#ifdef CONFIG_ARM_ARCH_TIMER
+   drm_printf(p, "drm-engine-panthor:\t%llu ns\n",
+  DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
+   arch_timer_get_cntfrq()));
+#endif
+   drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles);
+   drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate);
+   drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", 
ptdev->current_frequency);
+}
+
+static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file)
+{
+   struct drm_device *dev = file->minor->dev;
+   struct panthor_device *ptdev = container_of(dev, struct panthor_device, 
base);
+
+   panthor_gpu_show_fdinfo(ptdev, file->driver_priv, p);
+
+   drm_show_memory_stats(p, file);
+}
+
 static const struct file_operations panthor_drm_driver_fops = {
.open = drm_open,
.release = drm_release,
@@ -1360,6 +1389,7 @@ static const struct file_operations 
panthor_drm_driver_fops = {
.read = drm_read,
.llseek = noop_llseek,
.mmap = panthor_mmap,
+   .show_fdinfo = drm_show_fdinfo,
 };
 
 #ifdef CONFIG_DEBUG_FS
@@ -1378,6 +1408,7 @@ static const struct drm_driver panthor_drm_driver = {
   DRIVER_SYNCOBJ_TIMELINE | DRIVER_GEM_GPUVA,
.open = panthor_

[PATCH v2 0/3] Support fdinfo runtime and memory stats on Panthor

2024-04-23 Thread Adrián Larumbe

This patch series enables userspace utilities like gputop and nvtop to
query a render context's fdinfo file and figure out rates of engine
and memory utilisation.

Changelog:
v2:
 - Split original first patch in two, one for FW CS cycle and timestamp
 calculations and job accounting memory management, and a second one
 that enables fdinfo.
 - Moved NUM_INSTRS_PER_SLOT to the file prelude
 - Removed nelem variable from the group's struct definition.
 - Precompute size of group's syncobj BO to avoid code duplication.
 - Some minor nits.

Adrián Larumbe (3):
  drm/panthor: introduce job cycle and timestamp accounting
  drm/panthor: Add DRM fdinfo support
  drm/panthor: Enable fdinfo for memory stats

 drivers/gpu/drm/panthor/panthor_devfreq.c |  10 ++
 drivers/gpu/drm/panthor/panthor_device.h  |  11 ++
 drivers/gpu/drm/panthor/panthor_drv.c |  31 
 drivers/gpu/drm/panthor/panthor_gem.c |  12 ++
 drivers/gpu/drm/panthor/panthor_sched.c   | 204 +++---
 5 files changed, 244 insertions(+), 24 deletions(-)


base-commit: a6325ad47bc808aeb4c69ae36e0236c2c6d400b5
-- 
2.44.0

[PATCH v2 1/3] drm/panthor: introduce job cycle and timestamp accounting

2024-04-23 Thread Adrián Larumbe

Enable calculations of job submission times in clock cycles and wall
time. This is done by expanding the boilerplate command stream when running
a job to include instructions that compute said times right before an after
a user CS.

Those numbers are stored in the queue's group's sync objects BO, right
after them. Because the queues in a group might have a different number of
slots, one must keep track of the overall slot tally when reckoning the
offset of a queue's time sample structs, one for each slot.

NUM_INSTRS_PER_SLOT had to be increased to 32 because of adding new FW
instructions for storing and subtracting the cycle counter and timestamp
register, and it must always remain a power of two.

This commit is done in preparation for enabling DRM fdinfo support in the
Panthor driver, which depends on the numbers calculated herein.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_sched.c | 158 
 1 file changed, 134 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
b/drivers/gpu/drm/panthor/panthor_sched.c
index b3a51a6de523..320dfa0388ba 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -93,6 +93,9 @@
 #define MIN_CSGS   3
 #define MAX_CSG_PRIO   0xf
 
+#define NUM_INSTRS_PER_SLOT32
+#define SLOTSIZE   (NUM_INSTRS_PER_SLOT * 
sizeof(u64))
+
 struct panthor_group;
 
 /**
@@ -466,6 +469,9 @@ struct panthor_queue {
 */
struct list_head in_flight_jobs;
} fence_ctx;
+
+   /** @time_offset: Offset of panthor_job_times structs in group's 
syncobj bo. */
+   unsigned long time_offset;
 };
 
 /**
@@ -580,7 +586,17 @@ struct panthor_group {
 * One sync object per queue. The position of the sync object is
 * determined by the queue index.
 */
-   struct panthor_kernel_bo *syncobjs;
+
+   struct {
+   /** @bo: Kernel BO holding the sync objects. */
+   struct panthor_kernel_bo *bo;
+
+   /**
+* @times_offset: Beginning of panthor_job_times struct samples 
after
+* the group's array of sync objects.
+*/
+   size_t times_offset;
+   } syncobjs;
 
/** @state: Group state. */
enum panthor_group_state state;
@@ -639,6 +655,18 @@ struct panthor_group {
struct list_head wait_node;
 };
 
+struct panthor_job_times {
+   struct {
+   u64 before;
+   u64 after;
+   } cycles;
+
+   struct {
+   u64 before;
+   u64 after;
+   } time;
+};
+
 /**
  * group_queue_work() - Queue a group work
  * @group: Group to queue the work for.
@@ -718,6 +746,9 @@ struct panthor_job {
/** @queue_idx: Index of the queue inside @group. */
u32 queue_idx;
 
+   /** @ringbuf_idx: Index of the ringbuffer inside @queue. */
+   u32 ringbuf_idx;
+
/** @call_info: Information about the userspace command stream call. */
struct {
/** @start: GPU address of the userspace command stream. */
@@ -833,7 +864,7 @@ static void group_release_work(struct work_struct *work)
 
panthor_kernel_bo_destroy(panthor_fw_vm(ptdev), group->suspend_buf);
panthor_kernel_bo_destroy(panthor_fw_vm(ptdev), 
group->protm_suspend_buf);
-   panthor_kernel_bo_destroy(group->vm, group->syncobjs);
+   panthor_kernel_bo_destroy(group->vm, group->syncobjs.bo);
 
panthor_vm_put(group->vm);
kfree(group);
@@ -1924,8 +1955,6 @@ tick_ctx_init(struct panthor_scheduler *sched,
}
 }
 
-#define NUM_INSTRS_PER_SLOT16
-
 static void
 group_term_post_processing(struct panthor_group *group)
 {
@@ -1962,7 +1991,7 @@ group_term_post_processing(struct panthor_group *group)
spin_unlock(>fence_ctx.lock);
 
/* Manually update the syncobj seqno to unblock waiters. */
-   syncobj = group->syncobjs->kmap + (i * sizeof(*syncobj));
+   syncobj = group->syncobjs.bo->kmap + (i * sizeof(*syncobj));
syncobj->status = ~0;
syncobj->seqno = atomic64_read(>fence_ctx.seqno);
sched_queue_work(group->ptdev->scheduler, sync_upd);
@@ -2729,7 +2758,7 @@ static void group_sync_upd_work(struct work_struct *work)
if (!queue)
continue;
 
-   syncobj = group->syncobjs->kmap + (queue_idx * 
sizeof(*syncobj));
+   syncobj = group->syncobjs.bo->kmap + (queue_idx * 
sizeof(*syncobj));
 
spin_lock(>fence_ctx.lock);
list_for_each_entry_safe(job, job_tmp, 
>fence_ctx.in_flight_jobs, node) {
@@ -2764,15 +2793,23 @@ queue_run_job(struct drm_sched_job *sched_job)

Re: [PATCH 1/2] drm/panthor: Enable fdinfo for cycle and time measurements

2024-04-23 Thread Adrián Larumbe

Hi Liviu,

Thanks for your review. Also want to apologise for replying so late.
Today I'll be sending a v2 of this patch series after having applied 
all your suggestions.

On 28.03.2024 15:49, Liviu Dudau wrote:
> Hi Adrián,
> 
> Appologies for the delay in reviewing this.
> 
> On Tue, Mar 05, 2024 at 09:05:49PM +, Adrián Larumbe wrote:
> > These values are sampled by the firmware right before jumping into the UM
> > command stream and immediately after returning from it, and then kept 
> > inside a
> > per-job accounting structure. That structure is held inside the group's 
> > syncobjs
> > buffer object, at an offset that depends on the job's queue slot number and 
> > the
> > queue's index within the group.
> 
> I think this commit message is misleadingly short compared to the size of the
> changes. If I may, I would like to suggest that you split this commit into two
> parts, one introducing the changes in the ringbuf and syncobjs structures and
> the other exporting the statistics in the fdinfo.
> 
> > 
> > Signed-off-by: Adrián Larumbe 
> > ---
> >  drivers/gpu/drm/panthor/panthor_devfreq.c |  10 +
> >  drivers/gpu/drm/panthor/panthor_device.h  |  11 ++
> >  drivers/gpu/drm/panthor/panthor_drv.c |  31 
> >  drivers/gpu/drm/panthor/panthor_sched.c   | 217 +++---
> >  4 files changed, 241 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c 
> > b/drivers/gpu/drm/panthor/panthor_devfreq.c
> > index 7ac4fa290f27..51a7b734edcd 100644
> > --- a/drivers/gpu/drm/panthor/panthor_devfreq.c
> > +++ b/drivers/gpu/drm/panthor/panthor_devfreq.c
> > @@ -91,6 +91,7 @@ static int panthor_devfreq_get_dev_status(struct device 
> > *dev,
> > spin_lock_irqsave(>lock, irqflags);
> >  
> > panthor_devfreq_update_utilization(pdevfreq);
> > +   ptdev->current_frequency = status->current_frequency;
> >  
> > status->total_time = ktime_to_ns(ktime_add(pdevfreq->busy_time,
> >pdevfreq->idle_time));
> > @@ -130,6 +131,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
> > struct panthor_devfreq *pdevfreq;
> > struct dev_pm_opp *opp;
> > unsigned long cur_freq;
> > +   unsigned long freq = ULONG_MAX;
> > int ret;
> >  
> > pdevfreq = drmm_kzalloc(>base, sizeof(*ptdev->devfreq), 
> > GFP_KERNEL);
> > @@ -204,6 +206,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
> >  
> > dev_pm_opp_put(opp);
> >  
> > +   /* Find the fastest defined rate  */
> > +   opp = dev_pm_opp_find_freq_floor(dev, );
> > +   if (IS_ERR(opp))
> > +   return PTR_ERR(opp);
> > +   ptdev->fast_rate = freq;
> > +
> > +   dev_pm_opp_put(opp);
> > +
> > /*
> >  * Setup default thresholds for the simple_ondemand governor.
> >  * The values are chosen based on experiments.
> > diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
> > b/drivers/gpu/drm/panthor/panthor_device.h
> > index 51c9d61b6796..10e970921ca3 100644
> > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > @@ -162,6 +162,14 @@ struct panthor_device {
> >  */
> > u32 *dummy_latest_flush;
> > } pm;
> > +
> > +   unsigned long current_frequency;
> > +   unsigned long fast_rate;
> > +};
> > +
> > +struct panthor_gpu_usage {
> > +   u64 time;
> > +   u64 cycles;
> >  };
> >  
> >  /**
> > @@ -176,6 +184,9 @@ struct panthor_file {
> >  
> > /** @groups: Scheduling group pool attached to this file. */
> > struct panthor_group_pool *groups;
> > +
> > +   /** @stats: cycle and timestamp measures for job execution. */
> > +   struct panthor_gpu_usage stats;
> >  };
> >  
> >  int panthor_device_init(struct panthor_device *ptdev);
> > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
> > b/drivers/gpu/drm/panthor/panthor_drv.c
> > index ff484506229f..fa06b9e2c6cd 100644
> > --- a/drivers/gpu/drm/panthor/panthor_drv.c
> > +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> > @@ -3,6 +3,10 @@
> >  /* Copyright 2019 Linaro, Ltd., Rob Herring  */
> >  /* Copyright 2019 Collabora ltd. */
> >  
> > +#ifdef CONFIG_HAVE_ARM_ARCH_TIMER
> > +#include 
> > +#endif
> > +
> >  #include 
> >  #include 
> >  #include 
> > @@ -28,6 +32,8 @@
> >  #i

[PATCH] drm/panfrost: Fix dma_resv deadlock at drm object pin time

2024-04-21 Thread Adrián Larumbe

When Panfrost must pin an object that is being prepared a dma-buf
attachment for on behalf of another driver, the core drm gem object pinning
code already takes a lock on the object's dma reservation.

However, Panfrost GEM object's pinning callback would eventually try taking
the lock on the same dma reservation when delegating pinning of the object
onto the shmem subsystem, which led to a deadlock.

This can be shown by enabling CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which throws
the following recursive locking situation:

weston/3440 is trying to acquire lock:
00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
drm_gem_shmem_pin+0x34/0xb8 [drm_shmem_helper]
but task is already holding lock:
00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
drm_gem_pin+0x2c/0x80 [drm]

Fix it by assuming the object's reservation had already been locked by the
time we reach panfrost_gem_pin.

Cc: Thomas Zimmermann 
Cc: Dmitry Osipenko 
Cc: Boris Brezillon 
Cc: Steven Price 
Fixes: a78027847226 ("drm/gem: Acquire reservation lock in 
drm_gem_{pin/unpin}()")
Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index d47b40b82b0b..6c26652d425d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -192,7 +192,12 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
if (bo->is_heap)
return -EINVAL;
 
-   return drm_gem_shmem_pin(>base);
+   /*
+* Pinning can only happen in response to a prime attachment request 
from
+* another driver, but that's already being handled by drm_gem_pin
+*/
+   drm_WARN_ON(obj->dev, obj->import_attach);
+   return drm_gem_shmem_pin_locked(>base);
 }
 
 static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object 
*obj)

base-commit: 04687bff66b8a4b22748aa7215d3baef0b318e5b
-- 
2.44.0

Re: [PATCH] drm/panfrost: Show overall GPU usage stats through sysfs knob

2024-04-04 Thread Adrián Larumbe

On 04.04.2024 11:31, Maíra Canal wrote:
> On 4/4/24 11:00, Adrián Larumbe wrote:
> > This changeset is heavily inspired by commit 509433d8146c ("drm/v3d: Expose
> > the total GPU usage stats on sysfs"). The point is making broader GPU
> > occupancy numbers available through the sysfs interface, so that for every
> > job slot, its number of processed jobs and total processing time are
> > displayed.
> 
> Shouldn't we make this sysfs interface a generic DRM interface?
> Something that would be standard for all drivers and that we could
> integrate into gputop in the future.

I think the best way to generalise this sysfs knob would be to create a DRM
class attribute somewhere in drivers/gpu/drm/drm_sysfs.c and then adding a new
function to 'struct drm_driver' that would return a structure with the relevant
information (execution units and their names, number of processed jobs, etc).

What that information would exactly be is up to debate, I guess, since different
drivers might be interested in showing different bits of information.

Laying that down is important because the sysfs file would become part of the
device class API.

I might come up with a new RFC patch series that does precisely that, at least
for v3d and Panfrost, and maybe other people could pitch in with the sort of
things they'd like to see for other drivers?

Cheers,
Adrian

> Best Regards,
> - Maíra
> 
> > 
> > Cc: Boris Brezillon 
> > Cc: Christopher Healy 
> > Signed-off-by: Adrián Larumbe 
> > ---
> >   drivers/gpu/drm/panfrost/panfrost_device.h |  5 +++
> >   drivers/gpu/drm/panfrost/panfrost_drv.c| 49 --
> >   drivers/gpu/drm/panfrost/panfrost_job.c| 17 +++-
> >   drivers/gpu/drm/panfrost/panfrost_job.h|  3 ++
> >   4 files changed, 68 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
> > b/drivers/gpu/drm/panfrost/panfrost_device.h
> > index cffcb0ac7c11..1d343351c634 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_device.h
> > +++ b/drivers/gpu/drm/panfrost/panfrost_device.h
> > @@ -169,6 +169,11 @@ struct panfrost_engine_usage {
> > unsigned long long cycles[NUM_JOB_SLOTS];
> >   };
> > +struct panfrost_slot_usage {
> > +   u64 enabled_ns;
> > +   u64 jobs_sent;
> > +};
> > +
> >   struct panfrost_file_priv {
> > struct panfrost_device *pfdev;
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
> > b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > index ef9f6c0716d5..6afcde66270f 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > @@ -8,6 +8,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -524,6 +525,10 @@ static const struct drm_ioctl_desc 
> > panfrost_drm_driver_ioctls[] = {
> > PANFROST_IOCTL(MADVISE, madvise,DRM_RENDER_ALLOW),
> >   };
> > +static const char * const engine_names[] = {
> > +   "fragment", "vertex-tiler", "compute-only"
> > +};
> > +
> >   static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
> >  struct panfrost_file_priv *panfrost_priv,
> >  struct drm_printer *p)
> > @@ -543,10 +548,6 @@ static void panfrost_gpu_show_fdinfo(struct 
> > panfrost_device *pfdev,
> >  *   job spent on the GPU.
> >  */
> > -   static const char * const engine_names[] = {
> > -   "fragment", "vertex-tiler", "compute-only"
> > -   };
> > -
> > BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
> > for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
> > @@ -716,8 +717,48 @@ static ssize_t profiling_store(struct device *dev,
> >   static DEVICE_ATTR_RW(profiling);
> > +static ssize_t
> > +gpu_stats_show(struct device *dev, struct device_attribute *attr, char 
> > *buf)
> > +{
> > +   struct panfrost_device *pfdev = dev_get_drvdata(dev);
> > +   struct panfrost_slot_usage stats;
> > +   u64 timestamp = local_clock();
> > +   ssize_t len = 0;
> > +   unsigned int i;
> > +
> > +   BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
> > +
> > +   len += sysfs_emit(buf, "queuetimestampjobs
> > runtime\n");
> > +   len += sysfs_emit_at(buf, len, 
> > "-\n");
> > +
> > +   for (i = 0; i < NUM_JOB_SLOTS - 1;

[PATCH] drm/panfrost: Show overall GPU usage stats through sysfs knob

2024-04-04 Thread Adrián Larumbe

This changeset is heavily inspired by commit 509433d8146c ("drm/v3d: Expose
the total GPU usage stats on sysfs"). The point is making broader GPU
occupancy numbers available through the sysfs interface, so that for every
job slot, its number of processed jobs and total processing time are
displayed.

Cc: Boris Brezillon 
Cc: Christopher Healy 
Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/panfrost_device.h |  5 +++
 drivers/gpu/drm/panfrost/panfrost_drv.c| 49 --
 drivers/gpu/drm/panfrost/panfrost_job.c| 17 +++-
 drivers/gpu/drm/panfrost/panfrost_job.h|  3 ++
 4 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index cffcb0ac7c11..1d343351c634 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -169,6 +169,11 @@ struct panfrost_engine_usage {
unsigned long long cycles[NUM_JOB_SLOTS];
 };
 
+struct panfrost_slot_usage {
+   u64 enabled_ns;
+   u64 jobs_sent;
+};
+
 struct panfrost_file_priv {
struct panfrost_device *pfdev;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index ef9f6c0716d5..6afcde66270f 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -524,6 +525,10 @@ static const struct drm_ioctl_desc 
panfrost_drm_driver_ioctls[] = {
PANFROST_IOCTL(MADVISE, madvise,DRM_RENDER_ALLOW),
 };
 
+static const char * const engine_names[] = {
+   "fragment", "vertex-tiler", "compute-only"
+};
+
 static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev,
 struct panfrost_file_priv *panfrost_priv,
 struct drm_printer *p)
@@ -543,10 +548,6 @@ static void panfrost_gpu_show_fdinfo(struct 
panfrost_device *pfdev,
 *   job spent on the GPU.
 */
 
-   static const char * const engine_names[] = {
-   "fragment", "vertex-tiler", "compute-only"
-   };
-
BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
 
for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
@@ -716,8 +717,48 @@ static ssize_t profiling_store(struct device *dev,
 
 static DEVICE_ATTR_RW(profiling);
 
+static ssize_t
+gpu_stats_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+   struct panfrost_device *pfdev = dev_get_drvdata(dev);
+   struct panfrost_slot_usage stats;
+   u64 timestamp = local_clock();
+   ssize_t len = 0;
+   unsigned int i;
+
+   BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
+
+   len += sysfs_emit(buf, "queuetimestampjobs
runtime\n");
+   len += sysfs_emit_at(buf, len, 
"-\n");
+
+   for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
+
+   stats = get_slot_stats(pfdev, i);
+
+   /*
+* Each line will display the slot name, timestamp, the number
+* of jobs handled by that engine and runtime, as shown below:
+*
+* queuetimestampjobsruntime
+* -
+* fragment 12252943467507   638 1184747640
+* vertex-tiler 12252943467507   636 121663838
+*
+*/
+   len += sysfs_emit_at(buf, len, "%-13s%-17llu%-12llu%llu\n",
+engine_names[i],
+timestamp,
+stats.jobs_sent,
+stats.enabled_ns);
+   }
+
+   return len;
+}
+static DEVICE_ATTR_RO(gpu_stats);
+
 static struct attribute *panfrost_attrs[] = {
_attr_profiling.attr,
+   _attr_gpu_stats.attr,
NULL,
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index a61ef0af9a4e..4c779e6f4cb0 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -31,6 +31,8 @@ struct panfrost_queue_state {
struct drm_gpu_scheduler sched;
u64 fence_context;
u64 emit_seqno;
+
+   struct panfrost_slot_usage stats;
 };
 
 struct panfrost_job_slot {
@@ -160,15 +162,20 @@ panfrost_dequeue_job(struct panfrost_device *pfdev, int 
slot)
 
WARN_ON(!job);
if (job->is_profiled) {
+   u64 job_time = ktime_to_ns(ktime_sub(ktime_get(), 
job->start_time));
+
if (job->engine_usage) {
-   job->

[PATCH] drm/sysfs: Add drm class-wide attribute to get active device clients

2024-04-03 Thread Adrián Larumbe

Up to this day, all fdinfo-based GPU profilers must traverse the entire
/proc directory structure to find open DRM clients with fdinfo file
descriptors. This is inefficient and time-consuming.

This patch adds a new device class attribute that will install a sysfs file
per DRM device, which can be queried by profilers to get a list of PIDs for
their open clients. This file isn't human-readable, and it's meant to be
queried only by GPU profilers like gputop and nvtop.

Cc: Boris Brezillon 
Cc: Tvrtko Ursulin 
Cc: Christopher Healy 
Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/drm_internal.h   |  2 +-
 drivers/gpu/drm/drm_privacy_screen.c |  2 +-
 drivers/gpu/drm/drm_sysfs.c  | 89 ++--
 3 files changed, 74 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 2215baef9a3e..9a399b03d11c 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -145,7 +145,7 @@ bool drm_master_internal_acquire(struct drm_device *dev);
 void drm_master_internal_release(struct drm_device *dev);
 
 /* drm_sysfs.c */
-extern struct class *drm_class;
+extern struct class drm_class;
 
 int drm_sysfs_init(void);
 void drm_sysfs_destroy(void);
diff --git a/drivers/gpu/drm/drm_privacy_screen.c 
b/drivers/gpu/drm/drm_privacy_screen.c
index 6cc39e30781f..2fbd24ba5818 100644
--- a/drivers/gpu/drm/drm_privacy_screen.c
+++ b/drivers/gpu/drm/drm_privacy_screen.c
@@ -401,7 +401,7 @@ struct drm_privacy_screen *drm_privacy_screen_register(
mutex_init(>lock);
BLOCKING_INIT_NOTIFIER_HEAD(>notifier_head);
 
-   priv->dev.class = drm_class;
+   priv->dev.class = _class;
priv->dev.type = _privacy_screen_type;
priv->dev.parent = parent;
priv->dev.release = drm_privacy_screen_device_release;
diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
index a953f69a34b6..56ca9e22c720 100644
--- a/drivers/gpu/drm/drm_sysfs.c
+++ b/drivers/gpu/drm/drm_sysfs.c
@@ -58,8 +58,6 @@ static struct device_type drm_sysfs_device_connector = {
.name = "drm_connector",
 };
 
-struct class *drm_class;
-
 #ifdef CONFIG_ACPI
 static bool drm_connector_acpi_bus_match(struct device *dev)
 {
@@ -128,6 +126,62 @@ static const struct component_ops typec_connector_ops = {
 
 static CLASS_ATTR_STRING(version, S_IRUGO, "drm 1.1.0 20060810");
 
+static ssize_t clients_show(struct device *cd, struct device_attribute *attr, 
char *buf)
+{
+   struct drm_minor *minor = cd->driver_data;
+   struct drm_device *ddev = minor->dev;
+   struct drm_file *priv;
+   ssize_t offset = 0;
+   void *pid_buf;
+
+   if (minor->type != DRM_MINOR_RENDER)
+   return 0;
+
+   pid_buf = kvmalloc(PAGE_SIZE, GFP_KERNEL);
+   if (!pid_buf)
+   return 0;
+
+   mutex_lock(>filelist_mutex);
+   list_for_each_entry_reverse(priv, >filelist, lhead) {
+   struct pid *pid;
+
+   if (drm_WARN_ON(ddev, (PAGE_SIZE - offset) < sizeof(pid_t)))
+   break;
+
+   rcu_read_lock();
+   pid = rcu_dereference(priv->pid);
+   (*(pid_t *)(pid_buf + offset)) = pid_vnr(pid);
+   rcu_read_unlock();
+
+   offset += sizeof(pid_t);
+   }
+   mutex_unlock(>filelist_mutex);
+
+   if (offset < PAGE_SIZE)
+   (*(pid_t *)(pid_buf + offset)) = 0;
+
+   memcpy(buf, pid_buf, offset);
+
+   kvfree(pid_buf);
+
+   return offset;
+
+}
+static DEVICE_ATTR_RO(clients);
+
+static struct attribute *drm_device_attrs[] = {
+   _attr_clients.attr,
+   NULL,
+};
+ATTRIBUTE_GROUPS(drm_device);
+
+struct class drm_class = {
+   .name   = "drm",
+   .dev_groups = drm_device_groups,
+};
+
+static bool drm_class_initialised;
+
 /**
  * drm_sysfs_init - initialize sysfs helpers
  *
@@ -142,18 +196,19 @@ int drm_sysfs_init(void)
 {
int err;
 
-   drm_class = class_create("drm");
-   if (IS_ERR(drm_class))
-   return PTR_ERR(drm_class);
+   err = class_register(_class);
+   if (err)
+   return err;
 
-   err = class_create_file(drm_class, _attr_version.attr);
+   err = class_create_file(_class, _attr_version.attr);
if (err) {
-   class_destroy(drm_class);
-   drm_class = NULL;
+   class_destroy(_class);
return err;
}
 
-   drm_class->devnode = drm_devnode;
+   drm_class.devnode = drm_devnode;
+
+   drm_class_initialised = true;
 
drm_sysfs_acpi_register();
return 0;
@@ -166,12 +221,12 @@ int drm_sysfs_init(void)
  */
 void drm_sysfs_destroy(void)
 {
-   if (IS_ERR_OR_NULL(drm_class))
+   if (!drm_class_initialised)
return;
drm_sysfs_acpi_unregister()

[PATCH] drm/panfrost: Only display fdinfo's engine and cycle tags when profiling is on

2024-03-16 Thread Adrián Larumbe

If job accounting is disabled, then both fdinfo's drm-engine and drm-cycle
key values will remain immutable. In that case, it makes more sense not to
display them at all to avoid confusing user space profiling tools.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index eec250114114..ef9f6c0716d5 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -550,10 +550,12 @@ static void panfrost_gpu_show_fdinfo(struct 
panfrost_device *pfdev,
BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS);
 
for (i = 0; i < NUM_JOB_SLOTS - 1; i++) {
-   drm_printf(p, "drm-engine-%s:\t%llu ns\n",
-  engine_names[i], 
panfrost_priv->engine_usage.elapsed_ns[i]);
-   drm_printf(p, "drm-cycles-%s:\t%llu\n",
-  engine_names[i], 
panfrost_priv->engine_usage.cycles[i]);
+   if (pfdev->profile_mode) {
+   drm_printf(p, "drm-engine-%s:\t%llu ns\n",
+  engine_names[i], 
panfrost_priv->engine_usage.elapsed_ns[i]);
+   drm_printf(p, "drm-cycles-%s:\t%llu\n",
+  engine_names[i], 
panfrost_priv->engine_usage.cycles[i]);
+   }
drm_printf(p, "drm-maxfreq-%s:\t%lu Hz\n",
   engine_names[i], pfdev->pfdevfreq.fast_rate);
drm_printf(p, "drm-curfreq-%s:\t%lu Hz\n",

base-commit: 97252d0a4bfbb07079503d059f7522d305fe0f7a
-- 
2.43.0

Re: [PATCH v3 1/1] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs

2024-03-11 Thread Adrián Larumbe

On 11.03.2024 11:02, Boris Brezillon wrote:
> On Wed, 6 Mar 2024 08:33:47 +
> Tvrtko Ursulin  wrote:
> 
> > On 06/03/2024 01:56, Adrián Larumbe wrote:
> > > Debugfs isn't always available in production builds that try to squeeze
> > > every single byte out of the kernel image, but we still need a way to
> > > toggle the timestamp and cycle counter registers so that jobs can be
> > > profiled for fdinfo's drm engine and cycle calculations.
> > > 
> > > Drop the debugfs knob and replace it with a sysfs file that accomplishes
> > > the same functionality, and document its ABI in a separate file.
> > > 
> > > Signed-off-by: Adrián Larumbe 
> > > ---
> > >   .../testing/sysfs-driver-panfrost-profiling   | 10 +
> > >   Documentation/gpu/panfrost.rst|  9 
> > >   drivers/gpu/drm/panfrost/Makefile |  2 -
> > >   drivers/gpu/drm/panfrost/panfrost_debugfs.c   | 21 --
> > >   drivers/gpu/drm/panfrost/panfrost_debugfs.h   | 14 ---
> > >   drivers/gpu/drm/panfrost/panfrost_device.h|  2 +-
> > >   drivers/gpu/drm/panfrost/panfrost_drv.c   | 41 ---
> > >   drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
> > >   8 files changed, 57 insertions(+), 44 deletions(-)
> > >   create mode 100644 
> > > Documentation/ABI/testing/sysfs-driver-panfrost-profiling
> > >   delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
> > >   delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h
> > > 
> > > diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling 
> > > b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
> > > new file mode 100644
> > > index ..1d8bb0978920
> > > --- /dev/null
> > > +++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
> > > @@ -0,0 +1,10 @@
> > > +What:/sys/bus/platform/drivers/panfrost/.../profiling
> > > +Date:February 2024
> > > +KernelVersion:   6.8.0
> > > +Contact: Adrian Larumbe 
> > > +Description:
> > > + Get/set drm fdinfo's engine and cycles profiling status.
> > > + Valid values are:
> > > + 0: Don't enable fdinfo job profiling sources.
> > > + 1: Enable fdinfo job profiling sources, this enables both the 
> > > GPU's
> > > +timestamp and cycle counter registers.
> > > \ No newline at end of file
> > > diff --git a/Documentation/gpu/panfrost.rst 
> > > b/Documentation/gpu/panfrost.rst
> > > index b80e41f4b2c5..51ba375fd80d 100644
> > > --- a/Documentation/gpu/panfrost.rst
> > > +++ b/Documentation/gpu/panfrost.rst
> > > @@ -38,3 +38,12 @@ the currently possible format options:
> > >   
> > >   Possible `drm-engine-` key names are: `fragment`, and  `vertex-tiler`.
> > >   `drm-curfreq-` values convey the current operating frequency for that 
> > > engine.
> > > +
> > > +Users must bear in mind that engine and cycle sampling are disabled by 
> > > default,
> > > +because of power saving concerns. `fdinfo` users and benchmark 
> > > applications which
> > > +query the fdinfo file must make sure to toggle the job profiling status 
> > > of the
> > > +driver by writing into the appropriate sysfs node::
> > > +
> > > +echo  > 
> > > /sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/profiling  
> > 
> > A late thought - how it would work to not output the inactive fdinfo 
> > keys when this knob is not enabled?
> > 
> > Generic userspace like gputop already handles that and wouldn't show the 
> > stat. Which may be more user friendly than showing stats permanently at 
> > zero. It may be moot once you add the auto-toggle to gputop (or so) but 
> > perhaps worth considering.
> 
> I agree with Tvrtko, if the line being printed in fdinfo relies on some
> sysfs knob to be valid, we'd rather not print the information in that
> case, instead of printing zero.

Me too. I'll go first change both gputop and nvtop to make sure they use the new
sysfs knob for Panfrost, and then submit a new patch that handles printing of
the drm-cycles-* and drm-engine-* stats depending on the profiling knob state.

[PATCH v3 0/1] drm/panfrost: Replace fdinfo's profiling debugfs knob

2024-03-05 Thread Adrián Larumbe

This is v3 of the patch already discussed in [2] and [1]

Changelog:
 v3:
 - Replaced manual kobj initialisation with a device attribute
 - Handle user input with kstrtobool instead of treating it as an uint
 v2:
 - Turned the profile mode atomic variable into a boolean
 - Rewrote the sysfs file's uAPI documentation to make it more generic
 - Improved the casting of the profiling variable inside the Panfrost device 
structure

[2]https://lore.kernel.org/dri-devel/20240302154845.3223223-2-adrian.laru...@collabora.com/
[1]https://lore.kernel.org/dri-devel/20240221161237.2478193-1-adrian.laru...@collabora.com/


Adrián Larumbe (1):
  drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs

 .../testing/sysfs-driver-panfrost-profiling   | 10 +
 Documentation/gpu/panfrost.rst|  9 
 drivers/gpu/drm/panfrost/Makefile |  2 -
 drivers/gpu/drm/panfrost/panfrost_debugfs.c   | 21 --
 drivers/gpu/drm/panfrost/panfrost_debugfs.h   | 14 ---
 drivers/gpu/drm/panfrost/panfrost_device.h|  2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   | 41 ---
 drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
 8 files changed, 57 insertions(+), 44 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-panfrost-profiling
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h


base-commit: e635b7eb7062b464bbd9795308b1a80eac0b01f5
-- 
2.43.0

[PATCH v3 1/1] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs

2024-03-05 Thread Adrián Larumbe

Debugfs isn't always available in production builds that try to squeeze
every single byte out of the kernel image, but we still need a way to
toggle the timestamp and cycle counter registers so that jobs can be
profiled for fdinfo's drm engine and cycle calculations.

Drop the debugfs knob and replace it with a sysfs file that accomplishes
the same functionality, and document its ABI in a separate file.

Signed-off-by: Adrián Larumbe 
---
 .../testing/sysfs-driver-panfrost-profiling   | 10 +
 Documentation/gpu/panfrost.rst|  9 
 drivers/gpu/drm/panfrost/Makefile |  2 -
 drivers/gpu/drm/panfrost/panfrost_debugfs.c   | 21 --
 drivers/gpu/drm/panfrost/panfrost_debugfs.h   | 14 ---
 drivers/gpu/drm/panfrost/panfrost_device.h|  2 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   | 41 ---
 drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
 8 files changed, 57 insertions(+), 44 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-panfrost-profiling
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h

diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling 
b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
new file mode 100644
index ..1d8bb0978920
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
@@ -0,0 +1,10 @@
+What:  /sys/bus/platform/drivers/panfrost/.../profiling
+Date:  February 2024
+KernelVersion: 6.8.0
+Contact:   Adrian Larumbe 
+Description:
+   Get/set drm fdinfo's engine and cycles profiling status.
+   Valid values are:
+   0: Don't enable fdinfo job profiling sources.
+   1: Enable fdinfo job profiling sources, this enables both the 
GPU's
+  timestamp and cycle counter registers.
\ No newline at end of file
diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst
index b80e41f4b2c5..51ba375fd80d 100644
--- a/Documentation/gpu/panfrost.rst
+++ b/Documentation/gpu/panfrost.rst
@@ -38,3 +38,12 @@ the currently possible format options:
 
 Possible `drm-engine-` key names are: `fragment`, and  `vertex-tiler`.
 `drm-curfreq-` values convey the current operating frequency for that engine.
+
+Users must bear in mind that engine and cycle sampling are disabled by default,
+because of power saving concerns. `fdinfo` users and benchmark applications 
which
+query the fdinfo file must make sure to toggle the job profiling status of the
+driver by writing into the appropriate sysfs node::
+
+echo  > /sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/profiling
+
+Where `N` is either `0` or `1`, depending on the desired enablement status.
diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index 2c01c1e7523e..7da2b3f02ed9 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -12,6 +12,4 @@ panfrost-y := \
panfrost_perfcnt.o \
panfrost_dump.o
 
-panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o
-
 obj-$(CONFIG_DRM_PANFROST) += panfrost.o
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
deleted file mode 100644
index 72d4286a6bf7..
--- a/drivers/gpu/drm/panfrost/panfrost_debugfs.c
+++ /dev/null
@@ -1,21 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright 2023 Collabora ltd. */
-/* Copyright 2023 Amazon.com, Inc. or its affiliates. */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "panfrost_device.h"
-#include "panfrost_gpu.h"
-#include "panfrost_debugfs.h"
-
-void panfrost_debugfs_init(struct drm_minor *minor)
-{
-   struct drm_device *dev = minor->dev;
-   struct panfrost_device *pfdev = 
platform_get_drvdata(to_platform_device(dev->dev));
-
-   debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, 
>profile_mode);
-}
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
deleted file mode 100644
index c5af5f35877f..
--- a/drivers/gpu/drm/panfrost/panfrost_debugfs.h
+++ /dev/null
@@ -1,14 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright 2023 Collabora ltd.
- * Copyright 2023 Amazon.com, Inc. or its affiliates.
- */
-
-#ifndef PANFROST_DEBUGFS_H
-#define PANFROST_DEBUGFS_H
-
-#ifdef CONFIG_DEBUG_FS
-void panfrost_debugfs_init(struct drm_minor *minor);
-#endif
-
-#endif  /* PANFROST_DEBUGFS_H */
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 62f7e3527385..cffcb0ac7c11 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -130,7 +130,7 @@ struct panfrost_device {
struct list_head scheduled_jobs;
 
struct panfrost_perfcn

[PATCH 0/2] Support fdinfo runtime and memory stats on Panthor

2024-03-05 Thread Adrián Larumbe

This patch series enables userspace utilities like gputop and nvtop to
query a render context's fdinfo file and figure out rates of engine
and memory utilisation.

Adrián Larumbe (2):
  drm/panthor: Enable fdinfo for cycle and time measurements
  drm/panthor: Enable fdinfo for memory stats

 drivers/gpu/drm/panthor/panthor_devfreq.c |  10 +
 drivers/gpu/drm/panthor/panthor_device.h  |  11 ++
 drivers/gpu/drm/panthor/panthor_drv.c |  32 
 drivers/gpu/drm/panthor/panthor_gem.c |  12 ++
 drivers/gpu/drm/panthor/panthor_sched.c   | 217 +++---
 5 files changed, 254 insertions(+), 28 deletions(-)


base-commit: e635b7eb7062b464bbd9795308b1a80eac0b01f5
-- 
2.43.0

[PATCH 1/2] drm/panthor: Enable fdinfo for cycle and time measurements

2024-03-05 Thread Adrián Larumbe

These values are sampled by the firmware right before jumping into the UM
command stream and immediately after returning from it, and then kept inside a
per-job accounting structure. That structure is held inside the group's syncobjs
buffer object, at an offset that depends on the job's queue slot number and the
queue's index within the group.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_devfreq.c |  10 +
 drivers/gpu/drm/panthor/panthor_device.h  |  11 ++
 drivers/gpu/drm/panthor/panthor_drv.c |  31 
 drivers/gpu/drm/panthor/panthor_sched.c   | 217 +++---
 4 files changed, 241 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c 
b/drivers/gpu/drm/panthor/panthor_devfreq.c
index 7ac4fa290f27..51a7b734edcd 100644
--- a/drivers/gpu/drm/panthor/panthor_devfreq.c
+++ b/drivers/gpu/drm/panthor/panthor_devfreq.c
@@ -91,6 +91,7 @@ static int panthor_devfreq_get_dev_status(struct device *dev,
spin_lock_irqsave(>lock, irqflags);
 
panthor_devfreq_update_utilization(pdevfreq);
+   ptdev->current_frequency = status->current_frequency;
 
status->total_time = ktime_to_ns(ktime_add(pdevfreq->busy_time,
   pdevfreq->idle_time));
@@ -130,6 +131,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
struct panthor_devfreq *pdevfreq;
struct dev_pm_opp *opp;
unsigned long cur_freq;
+   unsigned long freq = ULONG_MAX;
int ret;
 
pdevfreq = drmm_kzalloc(>base, sizeof(*ptdev->devfreq), 
GFP_KERNEL);
@@ -204,6 +206,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
 
dev_pm_opp_put(opp);
 
+   /* Find the fastest defined rate  */
+   opp = dev_pm_opp_find_freq_floor(dev, );
+   if (IS_ERR(opp))
+   return PTR_ERR(opp);
+   ptdev->fast_rate = freq;
+
+   dev_pm_opp_put(opp);
+
/*
 * Setup default thresholds for the simple_ondemand governor.
 * The values are chosen based on experiments.
diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
b/drivers/gpu/drm/panthor/panthor_device.h
index 51c9d61b6796..10e970921ca3 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -162,6 +162,14 @@ struct panthor_device {
 */
u32 *dummy_latest_flush;
} pm;
+
+   unsigned long current_frequency;
+   unsigned long fast_rate;
+};
+
+struct panthor_gpu_usage {
+   u64 time;
+   u64 cycles;
 };
 
 /**
@@ -176,6 +184,9 @@ struct panthor_file {
 
/** @groups: Scheduling group pool attached to this file. */
struct panthor_group_pool *groups;
+
+   /** @stats: cycle and timestamp measures for job execution. */
+   struct panthor_gpu_usage stats;
 };
 
 int panthor_device_init(struct panthor_device *ptdev);
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
b/drivers/gpu/drm/panthor/panthor_drv.c
index ff484506229f..fa06b9e2c6cd 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -3,6 +3,10 @@
 /* Copyright 2019 Linaro, Ltd., Rob Herring  */
 /* Copyright 2019 Collabora ltd. */
 
+#ifdef CONFIG_HAVE_ARM_ARCH_TIMER
+#include 
+#endif
+
 #include 
 #include 
 #include 
@@ -28,6 +32,8 @@
 #include "panthor_regs.h"
 #include "panthor_sched.h"
 
+#define NS_PER_SEC  10ULL
+
 /**
  * DOC: user <-> kernel object copy helpers.
  */
@@ -1336,6 +1342,29 @@ static int panthor_mmap(struct file *filp, struct 
vm_area_struct *vma)
return ret;
 }
 
+static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev,
+   struct panthor_file *pfile,
+   struct drm_printer *p)
+{
+#ifdef CONFIG_HAVE_ARM_ARCH_TIMER
+   drm_printf(p, "drm-engine-panthor:\t%llu ns\n",
+  DIV_ROUND_UP_ULL((pfile->stats.time * NS_PER_SEC),
+   arch_timer_get_cntfrq()));
+#endif
+   drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles);
+   drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate);
+   drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", 
ptdev->current_frequency);
+}
+
+static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file)
+{
+   struct drm_device *dev = file->minor->dev;
+   struct panthor_device *ptdev = container_of(dev, struct panthor_device, 
base);
+
+   panthor_gpu_show_fdinfo(ptdev, file->driver_priv, p);
+
+}
+
 static const struct file_operations panthor_drm_driver_fops = {
.open = drm_open,
.release = drm_release,
@@ -1345,6 +1374,7 @@ static const struct file_operations 
panthor_drm_driver_fops = {
.read = drm_read,
.llseek = noop_llseek,
.mmap = panthor_mma

[PATCH 2/2] drm/panthor: Enable fdinfo for memory stats

2024-03-05 Thread Adrián Larumbe

When vm-binding an already-created BO, the entirety of its virtual size is
then backed by system memory, so its RSS is always the same as its virtual
size.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/panthor_drv.c |  1 +
 drivers/gpu/drm/panthor/panthor_gem.c | 12 
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
b/drivers/gpu/drm/panthor/panthor_drv.c
index fa06b9e2c6cd..a5398e161f75 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1363,6 +1363,7 @@ static void panthor_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
 
panthor_gpu_show_fdinfo(ptdev, file->driver_priv, p);
 
+   drm_show_memory_stats(p, file);
 }
 
 static const struct file_operations panthor_drm_driver_fops = {
diff --git a/drivers/gpu/drm/panthor/panthor_gem.c 
b/drivers/gpu/drm/panthor/panthor_gem.c
index d6483266d0c2..845724e3fd93 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -143,6 +143,17 @@ panthor_gem_prime_export(struct drm_gem_object *obj, int 
flags)
return drm_gem_prime_export(obj, flags);
 }
 
+static enum drm_gem_object_status panthor_gem_status(struct drm_gem_object 
*obj)
+{
+   struct panthor_gem_object *bo = to_panthor_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   if (bo->base.pages)
+   res |= DRM_GEM_OBJECT_RESIDENT;
+
+   return res;
+}
+
 static const struct drm_gem_object_funcs panthor_gem_funcs = {
.free = panthor_gem_free_object,
.print_info = drm_gem_shmem_object_print_info,
@@ -152,6 +163,7 @@ static const struct drm_gem_object_funcs panthor_gem_funcs 
= {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = panthor_gem_mmap,
+   .status = panthor_gem_status,
.export = panthor_gem_prime_export,
.vm_ops = _gem_shmem_vm_ops,
 };
-- 
2.43.0

[PATCH] drm/panthor: Add support for performance counters

2024-03-05 Thread Adrián Larumbe

This brings in support for Panthor's HW performance counters and querying
them from UM through a specific ioctl(). The code is inspired by existing
functionality for the Panfrost driver, with some noteworthy differences:

 - Sample size is now reported by the firmware rather than having to reckon
 it by hand
 - Counter samples are chained in a ring buffer that can be accessed
 concurrently, but only from threads within a single context (this is
 because of a HW limitation).
 - List of enabled counters must be explicitly told from UM
 - Rather than allocating the BO that will contain the perfcounter values
 in the render context's address space, the samples ring buffer is mapped
 onto the MCU's VM.
 - If more than one thread within the same context tries to dump a sample,
 then the kernel will copy the same frame to every single thread that was
 able to join the dump queue right before the FW finished processing the
 sample request.
 - UM must provide a BO handle for retrieval of perfcnt values rather
 than passing a user virtual address.

The reason multicontext access to the driver's perfcnt ioctl interface
isn't tolerated is because toggling a different set of counters than the
current one implies a counter reset, which also messes up with the ring
buffer's extraction and insertion pointers. This is an unfortunate
hardware limitation.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panthor/Makefile  |   3 +-
 drivers/gpu/drm/panthor/panthor_device.c  |   6 +
 drivers/gpu/drm/panthor/panthor_device.h  |   6 +
 drivers/gpu/drm/panthor/panthor_drv.c |  61 +++
 drivers/gpu/drm/panthor/panthor_fw.c  |  27 ++
 drivers/gpu/drm/panthor/panthor_fw.h  |  12 +
 drivers/gpu/drm/panthor/panthor_perfcnt.c | 551 ++
 drivers/gpu/drm/panthor/panthor_perfcnt.h |  31 ++
 drivers/gpu/drm/panthor/panthor_sched.c   |   1 +
 include/uapi/drm/panthor_drm.h|  72 +++
 10 files changed, 769 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/panthor/panthor_perfcnt.c
 create mode 100644 drivers/gpu/drm/panthor/panthor_perfcnt.h

diff --git a/drivers/gpu/drm/panthor/Makefile b/drivers/gpu/drm/panthor/Makefile
index 15294719b09c..7f841fd053d4 100644
--- a/drivers/gpu/drm/panthor/Makefile
+++ b/drivers/gpu/drm/panthor/Makefile
@@ -9,6 +9,7 @@ panthor-y := \
panthor_gpu.o \
panthor_heap.o \
panthor_mmu.o \
-   panthor_sched.o
+   panthor_sched.o \
+   panthor_perfcnt.o
 
 obj-$(CONFIG_DRM_PANTHOR) += panthor.o
diff --git a/drivers/gpu/drm/panthor/panthor_device.c 
b/drivers/gpu/drm/panthor/panthor_device.c
index bfe8da4a6e4c..5dfd82891063 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -20,6 +20,7 @@
 #include "panthor_mmu.h"
 #include "panthor_regs.h"
 #include "panthor_sched.h"
+#include "panthor_perfcnt.h"
 
 static int panthor_clk_init(struct panthor_device *ptdev)
 {
@@ -78,6 +79,7 @@ void panthor_device_unplug(struct panthor_device *ptdev)
/* Now, try to cleanly shutdown the GPU before the device resources
 * get reclaimed.
 */
+   panthor_perfcnt_unplug(ptdev);
panthor_sched_unplug(ptdev);
panthor_fw_unplug(ptdev);
panthor_mmu_unplug(ptdev);
@@ -233,6 +235,10 @@ int panthor_device_init(struct panthor_device *ptdev)
if (ret)
goto err_unplug_fw;
 
+   ret = panthor_perfcnt_init(ptdev);
+   if (ret)
+   goto err_rpm_put;
+
/* ~3 frames */
pm_runtime_set_autosuspend_delay(ptdev->base.dev, 50);
pm_runtime_use_autosuspend(ptdev->base.dev);
diff --git a/drivers/gpu/drm/panthor/panthor_device.h 
b/drivers/gpu/drm/panthor/panthor_device.h
index 51c9d61b6796..adf0bd29deb0 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -100,6 +100,9 @@ struct panthor_device {
/** @csif_info: Command stream interface information. */
struct drm_panthor_csif_info csif_info;
 
+   /** @perfcnt_info: Performance counters interface information. */
+   struct drm_panthor_perfcnt_info perfcnt_info;
+
/** @gpu: GPU management data. */
struct panthor_gpu *gpu;
 
@@ -127,6 +130,9 @@ struct panthor_device {
struct completion done;
} unplug;
 
+   /** @perfcnt: Device performance counters data. */
+   struct panthor_perfcnt *perfcnt;
+
/** @reset: Reset related fields. */
struct {
/** @wq: Ordered worqueud used to schedule reset operations. */
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c 
b/drivers/gpu/drm/panthor/panthor_drv.c
index ff484506229f..6cb9ea0aa553 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -27,6 +27,7 @@
 #include "panthor_mmu.h"
 #include "panthor_regs.h"
 #include "panthor_sched.h&quo

[PATCH v2 1/1] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs

2024-03-02 Thread Adrián Larumbe

Debugfs isn't always available in production builds that try to squeeze
every single byte out of the kernel image, but we still need a way to
toggle the timestamp and cycle counter registers so that jobs can be
profiled for fdinfo's drm engine and cycle calculations.

Drop the debugfs knob and replace it with a sysfs file that accomplishes
the same functionality, and document its ABI in a separate file.

Signed-off-by: Adrián Larumbe 
---
 .../testing/sysfs-driver-panfrost-profiling   | 10 +++
 Documentation/gpu/panfrost.rst|  9 +++
 drivers/gpu/drm/panfrost/Makefile |  5 +-
 drivers/gpu/drm/panfrost/panfrost_debugfs.c   | 21 --
 drivers/gpu/drm/panfrost/panfrost_debugfs.h   | 14 
 drivers/gpu/drm/panfrost/panfrost_device.h|  5 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   | 14 ++--
 drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
 drivers/gpu/drm/panfrost/panfrost_sysfs.c | 70 +++
 drivers/gpu/drm/panfrost/panfrost_sysfs.h | 15 
 10 files changed, 120 insertions(+), 45 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-panfrost-profiling
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.h

diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling 
b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
new file mode 100644
index ..889527b71b9d
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
@@ -0,0 +1,10 @@
+What:  /sys/bus/.../drivers/panfrost/.../drm/../profiling/status
+Date:  February 2024
+KernelVersion: 6.8.0
+Contact:   Adrian Larumbe 
+Description:
+   Get/set drm fdinfo's engine and cycles profiling status.
+   Valid values are:
+   0: Don't enable fdinfo job profiling sources.
+   1: Enable fdinfo job profiling sources, this enables both the 
GPU's
+  timestamp and cycle counter registers.
\ No newline at end of file
diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst
index b80e41f4b2c5..be4ac282ef63 100644
--- a/Documentation/gpu/panfrost.rst
+++ b/Documentation/gpu/panfrost.rst
@@ -38,3 +38,12 @@ the currently possible format options:
 
 Possible `drm-engine-` key names are: `fragment`, and  `vertex-tiler`.
 `drm-curfreq-` values convey the current operating frequency for that engine.
+
+Users must bear in mind that engine and cycle sampling are disabled by default,
+because of power saving concerns. `fdinfo` users and benchmark applications 
which
+query the fdinfo file must make sure to toggle the job profiling status of the
+driver by writing into the appropriate sysfs node::
+
+echo  > 
/sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/drm/card1/profiling
+
+Where `N` is either `0` or `1`, depending on the desired enablement status.
diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index 2c01c1e7523e..6e718595d8a6 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -10,8 +10,7 @@ panfrost-y := \
panfrost_job.o \
panfrost_mmu.o \
panfrost_perfcnt.o \
-   panfrost_dump.o
-
-panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o
+   panfrost_dump.o \
+   panfrost_sysfs.o
 
 obj-$(CONFIG_DRM_PANFROST) += panfrost.o
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
deleted file mode 100644
index 72d4286a6bf7..
--- a/drivers/gpu/drm/panfrost/panfrost_debugfs.c
+++ /dev/null
@@ -1,21 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright 2023 Collabora ltd. */
-/* Copyright 2023 Amazon.com, Inc. or its affiliates. */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "panfrost_device.h"
-#include "panfrost_gpu.h"
-#include "panfrost_debugfs.h"
-
-void panfrost_debugfs_init(struct drm_minor *minor)
-{
-   struct drm_device *dev = minor->dev;
-   struct panfrost_device *pfdev = 
platform_get_drvdata(to_platform_device(dev->dev));
-
-   debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, 
>profile_mode);
-}
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
deleted file mode 100644
index c5af5f35877f..
--- a/drivers/gpu/drm/panfrost/panfrost_debugfs.h
+++ /dev/null
@@ -1,14 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright 2023 Collabora ltd.
- * Copyright 2023 Amazon.com, Inc. or its affiliates.
- */
-
-#ifndef PANFROST_DEBUGFS_H
-#define PANFROST_DEBUGFS_H
-
-#ifdef CONFIG_DEBUG_FS
-void panfrost_debugfs_init(struct drm_minor *minor);
-#endif
-
-#endif  /* PANFROST_DEBUGFS_H */
diff --git a/dr

[PATCH v2 0/1] drm/panfrost: Replace fdinfo's profiling debugfs knob

2024-03-02 Thread Adrián Larumbe

This is v2 of the patch already discussed in [1]

Changelog:
 - Turned the profile mode atomic variable into a boolean
 - Rewrote the sysfs file's uAPI documentation to make it more generic
 - Improved the casting of the profiling variable inside the Panfrost device 
structure

[1]https://lore.kernel.org/dri-devel/20240221161237.2478193-1-adrian.laru...@collabora.com/

Adrián Larumbe (1):
  drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs

 .../testing/sysfs-driver-panfrost-profiling   | 10 +++
 Documentation/gpu/panfrost.rst|  9 +++
 drivers/gpu/drm/panfrost/Makefile |  5 +-
 drivers/gpu/drm/panfrost/panfrost_debugfs.c   | 21 --
 drivers/gpu/drm/panfrost/panfrost_debugfs.h   | 14 
 drivers/gpu/drm/panfrost/panfrost_device.h|  5 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   | 14 ++--
 drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
 drivers/gpu/drm/panfrost/panfrost_sysfs.c | 70 +++
 drivers/gpu/drm/panfrost/panfrost_sysfs.h | 15 
 10 files changed, 120 insertions(+), 45 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-panfrost-profiling
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.h


base-commit: 216c1282dde38ca87ebdf1ccacee5a0682901574
-- 
2.43.0

Re: [PATCH] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs

2024-02-27 Thread Adrián Larumbe

Hi Boris,

On 26.02.2024 09:51, Boris Brezillon wrote:
> On Wed, 21 Feb 2024 16:12:32 +
> Adrián Larumbe  wrote:
> 
> > Debugfs isn't always available in production builds that try to squeeze
> > every single byte out of the kernel image, but we still need a way to
> > toggle the timestamp and cycle counter registers so that jobs can be
> > profiled for fdinfo's drm engine and cycle calculations.
> > 
> > Drop the debugfs knob and replace it with a sysfs file that accomplishes
> > the same functionality, and document its ABI in a separate file.
> > 
> > Signed-off-by: Adrián Larumbe 
> > ---
> >  .../testing/sysfs-driver-panfrost-profiling   | 10 +++
> >  Documentation/gpu/panfrost.rst|  9 +++
> >  drivers/gpu/drm/panfrost/Makefile |  5 +-
> >  drivers/gpu/drm/panfrost/panfrost_debugfs.c   | 21 --
> >  drivers/gpu/drm/panfrost/panfrost_debugfs.h   | 14 
> >  drivers/gpu/drm/panfrost/panfrost_device.h|  5 +-
> >  drivers/gpu/drm/panfrost/panfrost_drv.c   | 14 ++--
> >  drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
> >  drivers/gpu/drm/panfrost/panfrost_sysfs.c | 74 +++
> >  drivers/gpu/drm/panfrost/panfrost_sysfs.h | 15 
> >  10 files changed, 124 insertions(+), 45 deletions(-)
> >  create mode 100644 
> > Documentation/ABI/testing/sysfs-driver-panfrost-profiling
> >  delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
> >  delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h
> >  create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.c
> >  create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.h
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling 
> > b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
> > new file mode 100644
> > index ..ce54069714f3
> > --- /dev/null
> > +++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
> > @@ -0,0 +1,10 @@
> > +What:  
> > /sys/bus/.../drivers/panfrost/.../drm/../profiling/status
> > +Date:  February 2024
> > +KernelVersion: 6.8.0
> > +Contact:   Adrian Larumbe 
> > +Description:
> > +Get/set drm fdinfo's engine and cycles profiling status.
> > +Valid values are:
> > +   0: Disable fdinfo job profiling sources. This disables both the 
> > GPU's
> > +timestamp and cycle counter registers.
> > +   1: Enable the above.
> > diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst
> > index b80e41f4b2c5..be4ac282ef63 100644
> > --- a/Documentation/gpu/panfrost.rst
> > +++ b/Documentation/gpu/panfrost.rst
> > @@ -38,3 +38,12 @@ the currently possible format options:
> >  
> >  Possible `drm-engine-` key names are: `fragment`, and  `vertex-tiler`.
> >  `drm-curfreq-` values convey the current operating frequency for that 
> > engine.
> > +
> > +Users must bear in mind that engine and cycle sampling are disabled by 
> > default,
> > +because of power saving concerns. `fdinfo` users and benchmark 
> > applications which
> > +query the fdinfo file must make sure to toggle the job profiling status of 
> > the
> > +driver by writing into the appropriate sysfs node::
> > +
> > +echo  > 
> > /sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/drm/card1/profiling
> > +
> > +Where `N` is either `0` or `1`, depending on the desired enablement status.
> > diff --git a/drivers/gpu/drm/panfrost/Makefile 
> > b/drivers/gpu/drm/panfrost/Makefile
> > index 2c01c1e7523e..6e718595d8a6 100644
> > --- a/drivers/gpu/drm/panfrost/Makefile
> > +++ b/drivers/gpu/drm/panfrost/Makefile
> > @@ -10,8 +10,7 @@ panfrost-y := \
> > panfrost_job.o \
> > panfrost_mmu.o \
> > panfrost_perfcnt.o \
> > -   panfrost_dump.o
> > -
> > -panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o
> > +   panfrost_dump.o \
> > +   panfrost_sysfs.o
> >  
> >  obj-$(CONFIG_DRM_PANFROST) += panfrost.o
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c 
> > b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
> > deleted file mode 100644
> > index 72d4286a6bf7..
> > --- a/drivers/gpu/drm/panfrost/panfrost_debugfs.c
> > +++ /dev/null
> > @@ -1,21 +0,0 @@
> > -// SPDX-License-Identifier: GPL-2.0
> > -/* Copyright 2023 Collabora ltd. */
> > -/* Copyright 2023 Amazon.com, Inc. or its affiliates. */
> > -
> > -#include 
> > -#include

Re: [PATCH] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs

2024-02-27 Thread Adrián Larumbe

Hi Steve,

On 21.02.2024 16:52, Steven Price wrote:
> On 21/02/2024 16:12, Adrián Larumbe wrote:
> > Debugfs isn't always available in production builds that try to squeeze
> > every single byte out of the kernel image, but we still need a way to
> > toggle the timestamp and cycle counter registers so that jobs can be
> > profiled for fdinfo's drm engine and cycle calculations.
> > 
> > Drop the debugfs knob and replace it with a sysfs file that accomplishes
> > the same functionality, and document its ABI in a separate file.
> > 
> > Signed-off-by: Adrián Larumbe 
> > ---
> >  .../testing/sysfs-driver-panfrost-profiling   | 10 +++
> >  Documentation/gpu/panfrost.rst|  9 +++
> >  drivers/gpu/drm/panfrost/Makefile |  5 +-
> >  drivers/gpu/drm/panfrost/panfrost_debugfs.c   | 21 --
> >  drivers/gpu/drm/panfrost/panfrost_debugfs.h   | 14 
> >  drivers/gpu/drm/panfrost/panfrost_device.h|  5 +-
> >  drivers/gpu/drm/panfrost/panfrost_drv.c   | 14 ++--
> >  drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
> >  drivers/gpu/drm/panfrost/panfrost_sysfs.c | 74 +++
> >  drivers/gpu/drm/panfrost/panfrost_sysfs.h | 15 
> >  10 files changed, 124 insertions(+), 45 deletions(-)
> >  create mode 100644 
> > Documentation/ABI/testing/sysfs-driver-panfrost-profiling
> >  delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
> >  delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h
> >  create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.c
> >  create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.h
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling 
> > b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
> > new file mode 100644
> > index ..ce54069714f3
> > --- /dev/null
> > +++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
> > @@ -0,0 +1,10 @@
> > +What:  
> > /sys/bus/.../drivers/panfrost/.../drm/../profiling/status
> > +Date:  February 2024
> > +KernelVersion: 6.8.0
> > +Contact:   Adrian Larumbe 
> > +Description:
> > +Get/set drm fdinfo's engine and cycles profiling status.
> > +Valid values are:
> > +   0: Disable fdinfo job profiling sources. This disables both the 
> > GPU's
> > +timestamp and cycle counter registers.
> > +   1: Enable the above.
> 
> Minor point, but if we're going to eventually come up with a generic way
> of doing this, then we're going to have to think about backwards
> compatibility for this sysfs file. I would expect in this new world '0'
> would mean "default behaviour; off unless the new-fangled thing enables
> profiling" and '1' means "force on".
> 
> In which case perhaps wording like the below would be clearer:
> 
> 0: Don't enable fdinfo job profiling sources.
> 1: Enable fdinfo job profiling sources, this enables both the GPU's
>timestamp and cycle counter registers.
> 
> Or am I being too picky over the wording ;)

I'm alright with this kind of wording, to keep things as generic as
possible. Initially I thought just mentioning 0 and 1 as potential toggle values
would be enough, and then every driver could describe their own profiling/status
sysfs knob in similar terms, depending on what profiling resouces they act upon.

> One other small issue below...
> 
> > diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst
> > index b80e41f4b2c5..be4ac282ef63 100644
> > --- a/Documentation/gpu/panfrost.rst
> > +++ b/Documentation/gpu/panfrost.rst
> > @@ -38,3 +38,12 @@ the currently possible format options:
> >  
> >  Possible `drm-engine-` key names are: `fragment`, and  `vertex-tiler`.
> >  `drm-curfreq-` values convey the current operating frequency for that 
> > engine.
> > +
> > +Users must bear in mind that engine and cycle sampling are disabled by 
> > default,
> > +because of power saving concerns. `fdinfo` users and benchmark 
> > applications which
> > +query the fdinfo file must make sure to toggle the job profiling status of 
> > the
> > +driver by writing into the appropriate sysfs node::
> > +
> > +echo  > 
> > /sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/drm/card1/profiling
> > +
> > +Where `N` is either `0` or `1`, depending on the desired enablement status.
> > diff --git a/drivers/gpu/drm/panfrost/Makefile 
> > b/drivers/gpu/drm/panfrost/Makefile
> > index 2c01c1e7523e..6e718595d8a6 10064

[PATCH] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs

2024-02-21 Thread Adrián Larumbe

Debugfs isn't always available in production builds that try to squeeze
every single byte out of the kernel image, but we still need a way to
toggle the timestamp and cycle counter registers so that jobs can be
profiled for fdinfo's drm engine and cycle calculations.

Drop the debugfs knob and replace it with a sysfs file that accomplishes
the same functionality, and document its ABI in a separate file.

Signed-off-by: Adrián Larumbe 
---
 .../testing/sysfs-driver-panfrost-profiling   | 10 +++
 Documentation/gpu/panfrost.rst|  9 +++
 drivers/gpu/drm/panfrost/Makefile |  5 +-
 drivers/gpu/drm/panfrost/panfrost_debugfs.c   | 21 --
 drivers/gpu/drm/panfrost/panfrost_debugfs.h   | 14 
 drivers/gpu/drm/panfrost/panfrost_device.h|  5 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   | 14 ++--
 drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
 drivers/gpu/drm/panfrost/panfrost_sysfs.c | 74 +++
 drivers/gpu/drm/panfrost/panfrost_sysfs.h | 15 
 10 files changed, 124 insertions(+), 45 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-panfrost-profiling
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.h

diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling 
b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
new file mode 100644
index ..ce54069714f3
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling
@@ -0,0 +1,10 @@
+What:  /sys/bus/.../drivers/panfrost/.../drm/../profiling/status
+Date:  February 2024
+KernelVersion: 6.8.0
+Contact:   Adrian Larumbe 
+Description:
+Get/set drm fdinfo's engine and cycles profiling status.
+Valid values are:
+   0: Disable fdinfo job profiling sources. This disables both the 
GPU's
+timestamp and cycle counter registers.
+   1: Enable the above.
diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst
index b80e41f4b2c5..be4ac282ef63 100644
--- a/Documentation/gpu/panfrost.rst
+++ b/Documentation/gpu/panfrost.rst
@@ -38,3 +38,12 @@ the currently possible format options:
 
 Possible `drm-engine-` key names are: `fragment`, and  `vertex-tiler`.
 `drm-curfreq-` values convey the current operating frequency for that engine.
+
+Users must bear in mind that engine and cycle sampling are disabled by default,
+because of power saving concerns. `fdinfo` users and benchmark applications 
which
+query the fdinfo file must make sure to toggle the job profiling status of the
+driver by writing into the appropriate sysfs node::
+
+echo  > 
/sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/drm/card1/profiling
+
+Where `N` is either `0` or `1`, depending on the desired enablement status.
diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index 2c01c1e7523e..6e718595d8a6 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -10,8 +10,7 @@ panfrost-y := \
panfrost_job.o \
panfrost_mmu.o \
panfrost_perfcnt.o \
-   panfrost_dump.o
-
-panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o
+   panfrost_dump.o \
+   panfrost_sysfs.o
 
 obj-$(CONFIG_DRM_PANFROST) += panfrost.o
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
deleted file mode 100644
index 72d4286a6bf7..
--- a/drivers/gpu/drm/panfrost/panfrost_debugfs.c
+++ /dev/null
@@ -1,21 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright 2023 Collabora ltd. */
-/* Copyright 2023 Amazon.com, Inc. or its affiliates. */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "panfrost_device.h"
-#include "panfrost_gpu.h"
-#include "panfrost_debugfs.h"
-
-void panfrost_debugfs_init(struct drm_minor *minor)
-{
-   struct drm_device *dev = minor->dev;
-   struct panfrost_device *pfdev = 
platform_get_drvdata(to_platform_device(dev->dev));
-
-   debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, 
>profile_mode);
-}
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
deleted file mode 100644
index c5af5f35877f..
--- a/drivers/gpu/drm/panfrost/panfrost_debugfs.h
+++ /dev/null
@@ -1,14 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright 2023 Collabora ltd.
- * Copyright 2023 Amazon.com, Inc. or its affiliates.
- */
-
-#ifndef PANFROST_DEBUGFS_H
-#define PANFROST_DEBUGFS_H
-
-#ifdef CONFIG_DEBUG_FS
-void panfrost_debugfs_init(struct drm_minor *minor);
-#endif
-
-#endif  /* PANFROST_DEBUGFS_H */
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/driver

Re: [PATCH 0/1] Always record job cycle and timestamp information

2024-02-21 Thread Adrián Larumbe

> On 21.02.2024 14:34, Tvrtko Ursulin wrote:
> 
> On 21/02/2024 09:40, Adrián Larumbe wrote:
> > Hi,
> > 
> > I just wanted to make sure we're on the same page on this matter. So in
> > Panfrost, and I guess in almost every other single driver out there, HW perf
> > counters and their uapi interface are orthogonal to fdinfo's reporting on 
> > drm
> > engine utilisation.
> > 
> > At the moment it seems like HW perfcounters and the way they're exposed to 
> > UM
> > are very idiosincratic and any attempt to unify their interface into a 
> > common
> > set of ioctl's sounds like a gargantuan task I wouldn't like to be faced 
> > with.
> 
> I share the same feeling on this sub-topic.
> 
> > As for fdinfo, I guess there's more room for coming up with common helpers 
> > that
> > could handle the toggling of HW support for drm engine calculations, but 
> > I'd at
> > least have to see how things are being done in let's say, Freedreno or 
> > Intel.
> 
> For Intel we don't need this ability, well at least for pre-GuC platforms.
> Stat collection is super cheap and permanently enabled there.
> 
> But let me copy Umesh because something at the back of my mind is telling me
> that perhaps there was something expensive about collecting these stats with
> the GuC backend? If so maybe a toggle would be beneficial there.
> 
> > Right now there's a pressing need to get rid of the debugfs knob for 
> > fdinfo's
> > drm engine profiling sources in Panfrost, after which I could perhaps draw 
> > up an
> > RFC for how to generalise this onto other drivers.
> 
> There is a knob currently meaning fdinfo does not work by default? If that is
> so, I would have at least expected someone had submitted a patch for gputop to
> handle this toggle. It being kind of a common reference implementation I don't
> think it is great if it does not work out of the box.

It does sound like I forgot to document this knob at the time I submited fdinfo
support for Panforst.  I'll make a point of mentioning it in a new patch where I
drop debugfs support and enable toggling from sysfs instead.

> The toggle as an idea sounds a bit annoying, but if there is no other
> realistic way maybe it is not too bad. As long as it is documented in the
> drm-usage-stats.rst, doesn't live in debugfs, and has some common plumbing
> implemented both on the kernel side and for the aforementioned gputop /
> igt_drm_fdinfo / igt_drm_clients. Where and how exactly TBD.

As soon as the new patch is merged, I'll go and reflect the driver uAPI changes
in all three of these.

> Regards,
> 
> Tvrtko
> 

Cheers,
Adrian

> > On 16.02.2024 17:43, Tvrtko Ursulin wrote:
> > > 
> > > On 16/02/2024 16:57, Daniel Vetter wrote:
> > > > On Wed, Feb 14, 2024 at 01:52:05PM +, Steven Price wrote:
> > > > > Hi Adrián,
> > > > > 
> > > > > On 14/02/2024 12:14, Adrián Larumbe wrote:
> > > > > > A driver user expressed interest in being able to access engine 
> > > > > > usage stats
> > > > > > through fdinfo when debugfs is not built into their kernel. In the 
> > > > > > current
> > > > > > implementation, this wasn't possible, because it was assumed even 
> > > > > > for
> > > > > > inflight jobs enabling the cycle counter and timestamp registers 
> > > > > > would
> > > > > > incur in additional power consumption, so both were kept disabled 
> > > > > > until
> > > > > > toggled through debugfs.
> > > > > > 
> > > > > > A second read of the TRM made me think otherwise, but this is 
> > > > > > something
> > > > > > that would be best clarified by someone from ARM's side.
> > > > > 
> > > > > I'm afraid I can't give a definitive answer. This will probably vary
> > > > > depending on implementation. The command register enables/disables
> > > > > "propagation" of the cycle/timestamp values. This propagation will 
> > > > > cost
> > > > > some power (gates are getting toggled) but whether that power is
> > > > > completely in the noise of the GPU as a whole I can't say.
> > > > > 
> > > > > The out-of-tree kbase driver only enables the counters for jobs
> > > > > explicitly marked (BASE_JD_REQ_PERMON) or due to an explicit 
> > > > > connection
> > > > > from a profiler.
> > > > > 
> > > > > I

Re: [PATCH 0/1] Always record job cycle and timestamp information

2024-02-21 Thread Adrián Larumbe

Hi,

I just wanted to make sure we're on the same page on this matter. So in
Panfrost, and I guess in almost every other single driver out there, HW perf
counters and their uapi interface are orthogonal to fdinfo's reporting on drm
engine utilisation.

At the moment it seems like HW perfcounters and the way they're exposed to UM
are very idiosincratic and any attempt to unify their interface into a common
set of ioctl's sounds like a gargantuan task I wouldn't like to be faced with.

As for fdinfo, I guess there's more room for coming up with common helpers that
could handle the toggling of HW support for drm engine calculations, but I'd at
least have to see how things are being done in let's say, Freedreno or Intel.

Right now there's a pressing need to get rid of the debugfs knob for fdinfo's
drm engine profiling sources in Panfrost, after which I could perhaps draw up an
RFC for how to generalise this onto other drivers.

Adrian

On 16.02.2024 17:43, Tvrtko Ursulin wrote:
> 
> On 16/02/2024 16:57, Daniel Vetter wrote:
> > On Wed, Feb 14, 2024 at 01:52:05PM +, Steven Price wrote:
> > > Hi Adrián,
> > > 
> > > On 14/02/2024 12:14, Adrián Larumbe wrote:
> > > > A driver user expressed interest in being able to access engine usage 
> > > > stats
> > > > through fdinfo when debugfs is not built into their kernel. In the 
> > > > current
> > > > implementation, this wasn't possible, because it was assumed even for
> > > > inflight jobs enabling the cycle counter and timestamp registers would
> > > > incur in additional power consumption, so both were kept disabled until
> > > > toggled through debugfs.
> > > > 
> > > > A second read of the TRM made me think otherwise, but this is something
> > > > that would be best clarified by someone from ARM's side.
> > > 
> > > I'm afraid I can't give a definitive answer. This will probably vary
> > > depending on implementation. The command register enables/disables
> > > "propagation" of the cycle/timestamp values. This propagation will cost
> > > some power (gates are getting toggled) but whether that power is
> > > completely in the noise of the GPU as a whole I can't say.
> > > 
> > > The out-of-tree kbase driver only enables the counters for jobs
> > > explicitly marked (BASE_JD_REQ_PERMON) or due to an explicit connection
> > > from a profiler.
> > > 
> > > I'd be happier moving the debugfs file to sysfs rather than assuming
> > > that the power consumption is small enough for all platforms.
> > > 
> > > Ideally we'd have some sort of kernel interface for a profiler to inform
> > > the kernel what it is interested in, but I can't immediately see how to
> > > make that useful across different drivers. kbase's profiling support is
> > > great with our profiling tools, but there's a very strong connection
> > > between the two.
> > 
> > Yeah I'm not sure whether a magic (worse probably per-driver massively
> > different) file in sysfs is needed to enable gpu perf monitoring stats in
> > fdinfo.
> > 
> > I get that we do have a bit a gap because the linux perf pmu stuff is
> > global, and you want per-process, and there's kinda no per-process support
> > for perf stats for devices. But that's probably the direction we want to
> > go, not so much fdinfo. At least for hardware performance counters and
> > things like that.
> > 
> > Iirc the i915 pmu support had some integration for per-process support,
> > you might want to chat with Tvrtko for kernel side and Lionel for more
> > userspace side. At least if I'm not making a complete mess and my memory
> > is vaguely related to reality. Adding them both.
> 
> Yeah there are two separate things, i915 PMU and i915 Perf/OA.
> 
> If my memory serves me right I indeed did have a per-process support for i915
> PMU implemented as an RFC (or at least a branch somewhere) some years back.
> IIRC it only exposed the per engine GPU utilisation and did not find it very
> useful versus the complexity. (I think it at least required maintaining a map
> of drm clients per task.)
> 
> Our more useful profiling is using a custom Perf/OA interface (Observation
> Architecture) which is possibly similar to kbase mentioned above. Why it is a
> custom interface is explained in a large comment on top of i915_perf.c. Not
> sure if all of them still hold but on the overall perf does not sound like the
> right fit for detailed GPU profiling.
> 
> Also PMU drivers are very challenging to get the implementation right, since
> locking model an

[PATCH 0/1] Always record job cycle and timestamp information

2024-02-14 Thread Adrián Larumbe

A driver user expressed interest in being able to access engine usage stats
through fdinfo when debugfs is not built into their kernel. In the current
implementation, this wasn't possible, because it was assumed even for
inflight jobs enabling the cycle counter and timestamp registers would
incur in additional power consumption, so both were kept disabled until
toggled through debugfs.

A second read of the TRM made me think otherwise, but this is something
that would be best clarified by someone from ARM's side.

Adrián Larumbe (1):
  drm/panfrost: Always record job cycle and timestamp information

 drivers/gpu/drm/panfrost/Makefile   |  2 --
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 --
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 
 drivers/gpu/drm/panfrost/panfrost_device.h  |  1 -
 drivers/gpu/drm/panfrost/panfrost_drv.c |  5 -
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 -
 drivers/gpu/drm/panfrost/panfrost_job.h |  1 -
 7 files changed, 9 insertions(+), 59 deletions(-)
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h


base-commit: 6b1f93ea345947c94bf3a7a6e668a2acfd310918
-- 
2.43.0

[PATCH 1/1] drm/panfrost: Always record job cycle and timestamp information

2024-02-14 Thread Adrián Larumbe

Some users of Panfrost expressed interest in being able to gather fdinfo
stats for running jobs, on production builds with no built-in debugfs
support. Sysfs was first considered, but eventually it was realised
timestamp and cycle counting don't incur in additional power consumption
when the GPU is running and there are inflight jobs, so there's no reason
to let user space toggle profiling.

Remove the profiling debugfs knob altogether so that cycle and timestamp
counting is always enabled for inflight jobs.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/Makefile   |  2 --
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 --
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 
 drivers/gpu/drm/panfrost/panfrost_device.h  |  1 -
 drivers/gpu/drm/panfrost/panfrost_drv.c |  5 -
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 -
 drivers/gpu/drm/panfrost/panfrost_job.h |  1 -
 7 files changed, 9 insertions(+), 59 deletions(-)
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h

diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index 2c01c1e7523e..7da2b3f02ed9 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -12,6 +12,4 @@ panfrost-y := \
panfrost_perfcnt.o \
panfrost_dump.o
 
-panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o
-
 obj-$(CONFIG_DRM_PANFROST) += panfrost.o
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
deleted file mode 100644
index 72d4286a6bf7..
--- a/drivers/gpu/drm/panfrost/panfrost_debugfs.c
+++ /dev/null
@@ -1,21 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/* Copyright 2023 Collabora ltd. */
-/* Copyright 2023 Amazon.com, Inc. or its affiliates. */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "panfrost_device.h"
-#include "panfrost_gpu.h"
-#include "panfrost_debugfs.h"
-
-void panfrost_debugfs_init(struct drm_minor *minor)
-{
-   struct drm_device *dev = minor->dev;
-   struct panfrost_device *pfdev = 
platform_get_drvdata(to_platform_device(dev->dev));
-
-   debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, 
>profile_mode);
-}
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
deleted file mode 100644
index c5af5f35877f..
--- a/drivers/gpu/drm/panfrost/panfrost_debugfs.h
+++ /dev/null
@@ -1,14 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Copyright 2023 Collabora ltd.
- * Copyright 2023 Amazon.com, Inc. or its affiliates.
- */
-
-#ifndef PANFROST_DEBUGFS_H
-#define PANFROST_DEBUGFS_H
-
-#ifdef CONFIG_DEBUG_FS
-void panfrost_debugfs_init(struct drm_minor *minor);
-#endif
-
-#endif  /* PANFROST_DEBUGFS_H */
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h 
b/drivers/gpu/drm/panfrost/panfrost_device.h
index 62f7e3527385..cd6bbcb2bea4 100644
--- a/drivers/gpu/drm/panfrost/panfrost_device.h
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -130,7 +130,6 @@ struct panfrost_device {
struct list_head scheduled_jobs;
 
struct panfrost_perfcnt *perfcnt;
-   atomic_t profile_mode;
 
struct mutex sched_lock;
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index a926d71e8131..e31fd4d62bbe 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -20,7 +20,6 @@
 #include "panfrost_job.h"
 #include "panfrost_gpu.h"
 #include "panfrost_perfcnt.h"
-#include "panfrost_debugfs.h"
 
 static bool unstable_ioctls;
 module_param_unsafe(unstable_ioctls, bool, 0600);
@@ -600,10 +599,6 @@ static const struct drm_driver panfrost_drm_driver = {
 
.gem_create_object  = panfrost_gem_create_object,
.gem_prime_import_sg_table = panfrost_gem_prime_import_sg_table,
-
-#ifdef CONFIG_DEBUG_FS
-   .debugfs_init   = panfrost_debugfs_init,
-#endif
 };
 
 static int panfrost_probe(struct platform_device *pdev)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 0c2dbf6ef2a5..745b16a77edd 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -159,13 +159,11 @@ panfrost_dequeue_job(struct panfrost_device *pfdev, int 
slot)
struct panfrost_job *job = pfdev->jobs[slot][0];
 
WARN_ON(!job);
-   if (job->is_profiled) {
-   if (job->engine_usage) {
-   job->engine_usage->elapsed_ns[slot] +=
-   ktime_to_ns(ktime_sub(ktime_get(), 
job->start_time));
-   job->engine_usage->cycles[slot] +=
-   panfrost_cy

[PATCH 2/2] drm/panfrost: Fix incorrect updating of current device frequency

2023-11-25 Thread Adrián Larumbe

It was noticed when setting the Panfrost's DVFS device to the performant
governor, GPU frequency as reported by fdinfo had dropped to 0 permamently.

There are two separate issues causing this behaviour:
 - Not initialising the device's current_frequency variable to its original
 value during device probe().
 - Updating said variable in Panfrost devfreq's get_dev_status() rather
 than after the new OPP's frequency had been retrieved in target(), which
 meant the old frequency would be assigned instead.

Signed-off-by: Adrián Larumbe 
Fixes: f11b0417eec2 ("drm/panfrost: Add fdinfo support GPU load metrics")
---
 drivers/gpu/drm/panfrost/panfrost_devfreq.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c 
b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
index f59c82ea8870..2d30da38c2c3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
@@ -29,14 +29,20 @@ static void panfrost_devfreq_update_utilization(struct 
panfrost_devfreq *pfdevfr
 static int panfrost_devfreq_target(struct device *dev, unsigned long *freq,
   u32 flags)
 {
+   struct panfrost_device *ptdev = dev_get_drvdata(dev);
struct dev_pm_opp *opp;
+   int err;
 
opp = devfreq_recommended_opp(dev, freq, flags);
if (IS_ERR(opp))
return PTR_ERR(opp);
dev_pm_opp_put(opp);
 
-   return dev_pm_opp_set_rate(dev, *freq);
+   err =  dev_pm_opp_set_rate(dev, *freq);
+   if (!err)
+   ptdev->pfdevfreq.current_frequency = *freq;
+
+   return err;
 }
 
 static void panfrost_devfreq_reset(struct panfrost_devfreq *pfdevfreq)
@@ -58,7 +64,6 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
spin_lock_irqsave(>lock, irqflags);
 
panfrost_devfreq_update_utilization(pfdevfreq);
-   pfdevfreq->current_frequency = status->current_frequency;
 
status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
   pfdevfreq->idle_time));
@@ -164,6 +169,14 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
 
panfrost_devfreq_profile.initial_freq = cur_freq;
 
+   /*
+* We could wait until panfrost_devfreq_target() to set this value, but
+* since the simple_ondemand governor works asynchronously, there's a
+* chance by the time someone opens the device's fdinfo file, current
+* frequency hasn't been updated yet, so let's just do an early set.
+*/
+   pfdevfreq->current_frequency = cur_freq;
+
/*
 * Set the recommend OPP this will enable and configure the regulator
 * if any and will avoid a switch off by regulator_late_cleanup()
-- 
2.42.0

[PATCH 0/2] Panfrost devfreq and GEM status fixes

2023-11-25 Thread Adrián Larumbe

During recent development of the Mali CSF GPU Panthor driver, a user
noticed that GPU frequency values as reported by fdinfo were
incorrect. This was traced back to incorrect handling of frequency value
updates. The same problem was seen in Panfrost.

Also one should consider GEM objects from a dma-buf import as being
resident in system memory, so that this can be reflected on fdinfo.

Adrián Larumbe (2):
  drm/panfrost: Consider dma-buf imported objects as resident
  drm/panfrost: Fix incorrect updating of current device frequency

 drivers/gpu/drm/panfrost/panfrost_devfreq.c | 17 +++--
 drivers/gpu/drm/panfrost/panfrost_gem.c |  2 +-
 2 files changed, 16 insertions(+), 3 deletions(-)


base-commit: 38f922a563aac3148ac73e73689805917f034cb5
-- 
2.42.0

[PATCH 1/2] drm/panfrost: Consider dma-buf imported objects as resident

2023-11-25 Thread Adrián Larumbe

A GEM object constructed from a dma-buf imported sgtable should be regarded
as being memory resident, because the dma-buf API mandates backing storage
to be allocated when attachment succeeds.

Signed-off-by: Adrián Larumbe 
Fixes: 9ccdac7aa822 ("drm/panfrost: Add fdinfo support for memory stats")
Reported-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 0cf64456e29a..d47b40b82b0b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -200,7 +200,7 @@ static enum drm_gem_object_status 
panfrost_gem_status(struct drm_gem_object *obj
struct panfrost_gem_object *bo = to_panfrost_bo(obj);
enum drm_gem_object_status res = 0;
 
-   if (bo->base.pages)
+   if (bo->base.base.import_attach || bo->base.pages)
res |= DRM_GEM_OBJECT_RESIDENT;
 
if (bo->base.madv == PANFROST_MADV_DONTNEED)
-- 
2.42.0

Re: [PATCH] drm/panfrost: Remove incorrect IS_ERR() check

2023-10-21 Thread Adrián Larumbe

On 20.10.2023 11:44, Steven Price wrote:
> sg_page_iter_page() doesn't return an error code, so the IS_ERR() check
> is wrong and the error path will never be executed. This also allows
> simplifying the code to remove the local variable 'page'.
> 
> CC: Adrián Larumbe 
> Reported-by: Dan Carpenter 
> Closes: 
> https://lore.kernel.org/r/376713ff-9a4f-4ea3-b097-fb5efb685d95@moroto.mountain
> Signed-off-by: Steven Price 

Reviewed-by: Adrián Larumbe 
Tested-by: Adrián Larumbe 

> ---
>  drivers/gpu/drm/panfrost/panfrost_dump.c | 12 ++--
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_dump.c 
> b/drivers/gpu/drm/panfrost/panfrost_dump.c
> index e7942ac449c6..47751302f1bc 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_dump.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_dump.c
> @@ -220,16 +220,8 @@ void panfrost_core_dump(struct panfrost_job *job)
>  
>   iter.hdr->bomap.data[0] = bomap - bomap_start;
>  
> - for_each_sgtable_page(bo->base.sgt, _iter, 0) {
> - struct page *page = sg_page_iter_page(_iter);
> -
> - if (!IS_ERR(page)) {
> - *bomap++ = page_to_phys(page);
> - } else {
> - dev_err(pfdev->dev, "Panfrost Dump: wrong 
> page\n");
> - *bomap++ = 0;
> - }
> - }
> + for_each_sgtable_page(bo->base.sgt, _iter, 0)
> + *bomap++ = page_to_phys(sg_page_iter_page(_iter));
>  
>   iter.hdr->bomap.iova = mapping->mmnode.start << PAGE_SHIFT;
>  
> -- 
> 2.34.1

[PATCH] Documentation/gpu: fix Panfrost documentation build warnings

2023-10-05 Thread Adrián Larumbe

Fix issues revealed by `make htmldocs` after adding Panfrost DRM
documentation file.

Signed-off-by: Adrián Larumbe 
Fixes: d124dac2089c ("drm/panfrost: Add fdinfo support GPU load metrics")
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202310030917.txzlpoeq-...@intel.com
---
 Documentation/gpu/drivers.rst  | 1 +
 Documentation/gpu/panfrost.rst | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/gpu/drivers.rst b/Documentation/gpu/drivers.rst
index 3a52f48215a3..45a12e552091 100644
--- a/Documentation/gpu/drivers.rst
+++ b/Documentation/gpu/drivers.rst
@@ -18,6 +18,7 @@ GPU Driver Documentation
xen-front
afbc
komeda-kms
+   panfrost
 
 .. only::  subproject and html
 
diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst
index ecc48ba5ac11..a07f6821e994 100644
--- a/Documentation/gpu/panfrost.rst
+++ b/Documentation/gpu/panfrost.rst
@@ -5,7 +5,7 @@
 .. _panfrost-usage-stats:
 
 Panfrost DRM client usage stats implementation
-==
+==
 
 The drm/Panfrost driver implements the DRM client usage stats specification as
 documented in :ref:`drm-client-usage-stats`.
-- 
2.42.0

[PATCH v8 3/5] drm/panfrost: Add fdinfo support for memory stats

2023-09-29 Thread Adrián Larumbe

A new DRM GEM object function is added so that drm_show_memory_stats can
provide more accurate memory usage numbers.

Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
after locking the driver's shrinker mutex, but drm_show_memory_stats takes
over the drm file's object handle database spinlock, so there's potential
for a race condition here.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 97e5bc4a82c8..b834777b409b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -568,6 +568,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
struct panfrost_device *pfdev = dev->dev_private;
 
panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+
+   drm_show_memory_stats(p, file);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 3c812fbd126f..de238b71b321 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -195,6 +195,20 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
return drm_gem_shmem_pin(>base);
 }
 
+static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object 
*obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   if (bo->base.pages)
+   res |= DRM_GEM_OBJECT_RESIDENT;
+
+   if (bo->base.madv == PANFROST_MADV_DONTNEED)
+   res |= DRM_GEM_OBJECT_PURGEABLE;
+
+   return res;
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -206,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
+   .status = panfrost_gem_status,
.vm_ops = _gem_shmem_vm_ops,
 };
 
-- 
2.42.0

[PATCH v8 5/5] drm/panfrost: Implement generic DRM object RSS reporting function

2023-09-29 Thread Adrián Larumbe

BO's RSS is updated every time new pages are allocated on demand and mapped
for the object at GPU page fault's IRQ handler, but only for heap buffers.
The reason this is unnecessary for non-heap buffers is that they are mapped
onto the GPU's VA space and backed by physical memory in their entirety at
BO creation time.

This calculation is unnecessary for imported PRIME objects, since heap
buffers cannot be exported by our driver, and the actual BO RSS size is the
one reported in its attached dmabuf structure.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 3 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index de238b71b321..0cf64456e29a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -209,6 +209,20 @@ static enum drm_gem_object_status 
panfrost_gem_status(struct drm_gem_object *obj
return res;
 }
 
+static size_t panfrost_gem_rss(struct drm_gem_object *obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+
+   if (bo->is_heap) {
+   return bo->heap_rss_size;
+   } else if (bo->base.pages) {
+   WARN_ON(bo->heap_rss_size);
+   return bo->base.base.size;
+   }
+
+   return 0;
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -221,6 +235,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
.status = panfrost_gem_status,
+   .rss = panfrost_gem_rss,
.vm_ops = _gem_shmem_vm_ops,
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h 
b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ad2877eeeccd..13c0a8149c3a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -36,6 +36,11 @@ struct panfrost_gem_object {
 */
atomic_t gpu_usecount;
 
+   /*
+* Object chunk size currently mapped onto physical memory
+*/
+   size_t heap_rss_size;
+
bool noexec :1;
bool is_heap:1;
 };
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index d54d4e7b2195..846dd697c410 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -522,6 +522,7 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
 
bomapping->active = true;
+   bo->heap_rss_size += SZ_2M;
 
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
 
-- 
2.42.0

[PATCH v8 4/5] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo

2023-09-29 Thread Adrián Larumbe

Some BO's might be mapped onto physical memory chunkwise and on demand,
like Panfrost's tiler heap. In this case, even though the
drm_gem_shmem_object page array might already be allocated, only a very
small fraction of the BO is currently backed by system memory, but
drm_show_memory_stats will then proceed to add its entire virtual size to
the file's total resident size regardless.

This led to very unrealistic RSS sizes being reckoned for Panfrost, where
said tiler heap buffer is initially allocated with a virtual size of 128
MiB, but only a small part of it will eventually be backed by system memory
after successive GPU page faults.

Provide a new DRM object generic function that would allow drivers to
return a more accurate RSS and purgeable sizes for their BOs.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/gpu/drm/drm_file.c | 8 +---
 include/drm/drm_gem.h  | 9 +
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 883d83bc0e3d..9a1bd8d0d785 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -930,6 +930,8 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
spin_lock(>table_lock);
idr_for_each_entry (>object_idr, obj, id) {
enum drm_gem_object_status s = 0;
+   size_t add_size = (obj->funcs && obj->funcs->rss) ?
+   obj->funcs->rss(obj) : obj->size;
 
if (obj->funcs && obj->funcs->status) {
s = obj->funcs->status(obj);
@@ -944,7 +946,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
 
if (s & DRM_GEM_OBJECT_RESIDENT) {
-   status.resident += obj->size;
+   status.resident += add_size;
} else {
/* If already purged or not yet backed by pages, don't
 * count it as purgeable:
@@ -953,14 +955,14 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
 
if (!dma_resv_test_signaled(obj->resv, 
dma_resv_usage_rw(true))) {
-   status.active += obj->size;
+   status.active += add_size;
 
/* If still active, don't count as purgeable: */
s &= ~DRM_GEM_OBJECT_PURGEABLE;
}
 
if (s & DRM_GEM_OBJECT_PURGEABLE)
-   status.purgeable += obj->size;
+   status.purgeable += add_size;
}
spin_unlock(>table_lock);
 
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bc9f6aa2f3fe..16364487fde9 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
 */
enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
 
+   /**
+* @rss:
+*
+* Return resident size of the object in physical memory.
+*
+* Called by drm_show_memory_stats().
+*/
+   size_t (*rss)(struct drm_gem_object *obj);
+
/**
 * @vm_ops:
 *
-- 
2.42.0

[PATCH v8 1/5] drm/panfrost: Add cycle count GPU register definitions

2023-09-29 Thread Adrián Larumbe

These GPU registers will be used when programming the cycle counter, which
we need for providing accurate fdinfo drm-cycles values to user space.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: AngeloGioacchino Del Regno 

---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 919f44ac853d..55ec807550b3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -46,6 +46,8 @@
 #define   GPU_CMD_SOFT_RESET   0x01
 #define   GPU_CMD_PERFCNT_CLEAR0x03
 #define   GPU_CMD_PERFCNT_SAMPLE   0x04
+#define   GPU_CMD_CYCLE_COUNT_START0x05
+#define   GPU_CMD_CYCLE_COUNT_STOP 0x06
 #define   GPU_CMD_CLEAN_CACHES 0x07
 #define   GPU_CMD_CLEAN_INV_CACHES 0x08
 #define GPU_STATUS 0x34
@@ -73,6 +75,9 @@
 #define GPU_PRFCNT_TILER_EN0x74
 #define GPU_PRFCNT_MMU_L2_EN   0x7c
 
+#define GPU_CYCLE_COUNT_LO 0x90
+#define GPU_CYCLE_COUNT_HI 0x94
+
 #define GPU_THREAD_MAX_THREADS 0x0A0   /* (RO) Maximum number of 
threads per core */
 #define GPU_THREAD_MAX_WORKGROUP_SIZE  0x0A4   /* (RO) Maximum workgroup size 
*/
 #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8   /* (RO) Maximum threads waiting 
at a barrier */
-- 
2.42.0

[PATCH v8 2/5] drm/panfrost: Add fdinfo support GPU load metrics

2023-09-29 Thread Adrián Larumbe

The drm-stats fdinfo tags made available to user space are drm-engine,
drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

This deviates from standard practice in other DRM drivers, where a single
set of key:value pairs is provided for the whole render engine. However,
Panfrost has separate queues for fragment and vertex/tiler jobs, so a
decision was made to calculate bus cycles and workload times separately.

Maximum operating frequency is calculated at devfreq initialisation time.
Current frequency is made available to user space because nvtop uses it
when performing engine usage calculations.

It is important to bear in mind that both GPU cycle and kernel time numbers
provided are at best rough estimations, and always reported in excess from
the actual figure because of two reasons:
 - Excess time because of the delay between the end of a job processing,
   the subsequent job IRQ and the actual time of the sample.
 - Time spent in the engine queue waiting for the GPU to pick up the next
   job.

To avoid race conditions during enablement/disabling, a reference counting
mechanism was introduced, and a job flag that tells us whether a given job
increased the refcount. This is necessary, because user space can toggle
cycle counting through a debugfs file, and a given job might have been in
flight by the time cycle counting was disabled.

The main goal of the debugfs cycle counter knob is letting tools like nvtop
or IGT's gputop switch it at any time, to avoid power waste in case no
engine usage measuring is necessary.

Also add a documentation file explaining the possible values for fdinfo's
engine keystrings and Panfrost-specific drm-curfreq- pairs.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
Reviewed-by: AngeloGioacchino Del Regno 

---
 Documentation/gpu/drm-usage-stats.rst   |  1 +
 Documentation/gpu/panfrost.rst  | 38 ++
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 58 -
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  4 ++
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 +
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 14 files changed, 233 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/gpu/panfrost.rst
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index fe35a291ff3e..8d963cd7c1b7 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -169,3 +169,4 @@ Driver specific implementations
 ---
 
 :ref:`i915-usage-stats`
+:ref:`panfrost-usage-stats`
diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst
new file mode 100644
index ..ecc48ba5ac11
--- /dev/null
+++ b/Documentation/gpu/panfrost.rst
@@ -0,0 +1,38 @@
+===
+ drm/Panfrost Mali Driver
+===
+
+.. _panfrost-usage-stats:
+
+Panfrost DRM client usage stats implementation
+==
+
+The drm/Panfrost driver implements the DRM client usage stats specification as
+documented in :ref:`drm-client-usage-stats`.
+
+Example of the output showing the implemented key value pairs and entirety of
+the currently possible format options:
+
+::
+  pos:0
+  flags:  0242
+  mnt_id: 27
+  ino:531
+  drm-driver: panfrost
+  drm-client-id:  14
+  drm-engine-fragment:1846584880 ns
+  drm-cycles-fragment:1424359409
+  drm-maxfreq-fragment:   79987 Hz
+  drm-curfreq-fragment:   79987 Hz
+  drm-engine-vertex-tiler:71932239 ns
+  drm-cycles-vertex-tiler:52617357
+  drm-maxfreq-vertex-tiler:   79987 Hz
+  drm-curfreq-vertex-tiler:   79987 Hz
+  drm-total-memory:   290 MiB
+  drm-shared-memory:  0 MiB
+  drm-active-memory:  226 MiB
+  drm-resident-memory:36496 KiB
+  drm-purgeable-memory:   128 KiB
+
+Possible `drm-engine-` key names are: `fragment`, and  `vertex-tiler`.
+`drm-curfreq-` values convey the current operating frequency for that engine.
diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index 7da2b3f02ed9..2c01c1e7523e 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -12,4 +12,6 @@ panfrost-y

[PATCH v8 0/5] Add fdinfo support to Panfrost

2023-09-29 Thread Adrián Larumbe

This patch series adds fdinfo support to the Panfrost DRM driver. It will
display a series of key:value pairs under /proc/pid/fdinfo/fd for render
processes that open the Panfrost DRM file.

The pairs contain basic drm gpu engine and memory region information that
can either be cat by a privileged user or accessed with IGT's gputop
utility.

Changelog:

v1: https://lore.kernel.org/lkml/bb52b872-e41b-3894-285e-b52cfc849...@arm.com/T/

v2: https://lore.kernel.org/lkml/20230901084457.5bc1a...@collabora.com/T/
 - Changed the way gpu cycles and engine time are calculated, using GPU
   registers and taking into account potential resets.
 - Split render engine values into fragment and vertex/tiler ones.
 - Added more fine-grained calculation of RSS size for BO's.
 - Implemente selection of drm-memory region size units.
 - Removed locking of shrinker's mutex in GEM obj status function.

v3: 
https://lore.kernel.org/lkml/20230905184533.959171-1-adrian.laru...@collabora.com/
 - Changed fdinfo engine names to something more descriptive.;
 - Mentioned GPU cycle counts aren't an exact measure.
 - Handled the case when job->priv might be NULL.
 - Handled 32 bit overflow of cycle register.
 - Kept fdinfo drm memory stats size unit display within 10k times the
   previous multiplier for more accurate BO size numbers.
 - Removed special handling of Prime imported BO RSS.
 - Use rss_size only for heap objects.
 - Use bo->base.madv instead of specific purgeable flag.
 - Fixed kernel test robot warnings.

v4: 
https://lore.kernel.org/lkml/20230912084044.955864-1-adrian.laru...@collabora.com/
 - Move cycle counter get and put to panfrost_job_hw_submit and
   panfrost_job_handle_{err,done} for more accuracy.
 - Make sure cycle counter refs are released in reset path
 - Drop the model param for toggling cycle counting and do
   leave it down to the debugfs file.
 - Don't disable cycle counter when togglint debugfs file,
   let refcounting logic handle it instead.
 - Remove fdinfo data nested structure definion and 'names' field
 - When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume
   granuality of 2MiB for every successful mapping.
 - drm-file picks an fdinfo memory object size unit that doesn't lose precision.

v5: 
https://lore.kernel.org/lkml/20230914223928.2374933-1-adrian.laru...@collabora.com/
 - Removed explicit initialisation of atomic variable for profiling mode,
   as it's allocated with kzalloc.
 - Pass engine utilisation structure to jobs rather than the file context, to 
avoid
   future misusage of the latter.
 - Remove double reading of cycle counter register and ktime in job deqeueue 
function,
   as the scheduler will make sure these values are read over in case of 
requeuing.
 - Moved putting of cycle counting refcnt into panfrost job dequeue.
   function to avoid repetition.

v6: 
https://lore.kernel.org/lkml/c73ad42b-a8db-23c2-86c7-1a2939dba...@linux.intel.com/T/
 - Fix wrong swapped-round engine time and cycle values in fdinfo
   drm print statements.

v7: 
https://lore.kernel.org/lkml/20230927213133.1651169-6-adrian.laru...@collabora.com/T/
 - Make sure an object's actual RSS size is added to the overall fdinfo's 
purgeable
   and active size tally when it's both resident and purgeable or active.
 - Create a drm/panfrost.rst documentation file with meaning of fdinfo strings.
 - BUILD_BUG_ON checking the engine name array size for fdinfo.
 - Added copyright notices for Amazon in Panfrost's new debugfs files.
 - Discarded fdinfo memory stats unit size selection patch.

v8:
 - Style improvements and addressing nitpicks. 

Adrián Larumbe (5):
  drm/panfrost: Add cycle count GPU register definitions
  drm/panfrost: Add fdinfo support GPU load metrics
  drm/panfrost: Add fdinfo support for memory stats
  drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
  drm/panfrost: Implement generic DRM object RSS reporting function

 Documentation/gpu/drm-usage-stats.rst   |  1 +
 Documentation/gpu/panfrost.rst  | 38 +
 drivers/gpu/drm/drm_file.c  |  8 +--
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 60 -
 drivers/gpu/drm/panfrost/panfrost_gem.c | 30 +++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  4 ++
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 +
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 drivers/gpu/drm/panfrost/panfrost_regs.h

[PATCH v7 2/5] drm/panfrost: Add fdinfo support GPU load metrics

2023-09-27 Thread Adrián Larumbe

The drm-stats fdinfo tags made available to user space are drm-engine,
drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

This deviates from standard practice in other DRM drivers, where a single
set of key:value pairs is provided for the whole render engine. However,
Panfrost has separate queues for fragment and vertex/tiler jobs, so a
decision was made to calculate bus cycles and workload times separately.

Maximum operating frequency is calculated at devfreq initialisation time.
Current frequency is made available to user space because nvtop uses it
when performing engine usage calculations.

It is important to bear in mind that both GPU cycle and kernel time numbers
provided are at best rough estimations, and always reported in excess from
the actual figure because of two reasons:
 - Excess time because of the delay between the end of a job processing,
   the subsequent job IRQ and the actual time of the sample.
 - Time spent in the engine queue waiting for the GPU to pick up the next
   job.

To avoid race conditions during enablement/disabling, a reference counting
mechanism was introduced, and a job flag that tells us whether a given job
increased the refcount. This is necessary, because user space can toggle
cycle counting through a debugfs file, and a given job might have been in
flight by the time cycle counting was disabled.

The main goal of the debugfs cycle counter knob is letting tools like nvtop
or IGT's gputop switch it at any time, to avoid power waste in case no
engine usage measuring is necessary.

Add also documentation file explaining the possible values for fdinfo's
engine keystrings and Panfrost-specific drm-curfreq- pairs.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 Documentation/gpu/drm-usage-stats.rst   |  1 +
 Documentation/gpu/panfrost.rst  | 38 ++
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 58 -
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  4 ++
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 +
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 14 files changed, 233 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/gpu/panfrost.rst
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h

diff --git a/Documentation/gpu/drm-usage-stats.rst 
b/Documentation/gpu/drm-usage-stats.rst
index fe35a291ff3e..8d963cd7c1b7 100644
--- a/Documentation/gpu/drm-usage-stats.rst
+++ b/Documentation/gpu/drm-usage-stats.rst
@@ -169,3 +169,4 @@ Driver specific implementations
 ---
 
 :ref:`i915-usage-stats`
+:ref:`panfrost-usage-stats`
diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst
new file mode 100644
index ..ecc48ba5ac11
--- /dev/null
+++ b/Documentation/gpu/panfrost.rst
@@ -0,0 +1,38 @@
+===
+ drm/Panfrost Mali Driver
+===
+
+.. _panfrost-usage-stats:
+
+Panfrost DRM client usage stats implementation
+==
+
+The drm/Panfrost driver implements the DRM client usage stats specification as
+documented in :ref:`drm-client-usage-stats`.
+
+Example of the output showing the implemented key value pairs and entirety of
+the currently possible format options:
+
+::
+  pos:0
+  flags:  0242
+  mnt_id: 27
+  ino:531
+  drm-driver: panfrost
+  drm-client-id:  14
+  drm-engine-fragment:1846584880 ns
+  drm-cycles-fragment:1424359409
+  drm-maxfreq-fragment:   79987 Hz
+  drm-curfreq-fragment:   79987 Hz
+  drm-engine-vertex-tiler:71932239 ns
+  drm-cycles-vertex-tiler:52617357
+  drm-maxfreq-vertex-tiler:   79987 Hz
+  drm-curfreq-vertex-tiler:   79987 Hz
+  drm-total-memory:   290 MiB
+  drm-shared-memory:  0 MiB
+  drm-active-memory:  226 MiB
+  drm-resident-memory:36496 KiB
+  drm-purgeable-memory:   128 KiB
+
+Possible `drm-engine-` key names are: `fragment`, and  `vertex-tiler`.
+`drm-curfreq-` values convey the current operating frequency for that engine.
diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index 7da2b3f02ed9..2c01c1e7523e 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -12,4 +12,6 @@ panfrost-y := \
panfrost_perfcnt.o \
panfrost_dump.o

[PATCH v7 4/5] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo

2023-09-27 Thread Adrián Larumbe

Some BO's might be mapped onto physical memory chunkwise and on demand,
like Panfrost's tiler heap. In this case, even though the
drm_gem_shmem_object page array might already be allocated, only a very
small fraction of the BO is currently backed by system memory, but
drm_show_memory_stats will then proceed to add its entire virtual size to
the file's total resident size regardless.

This led to very unrealistic RSS sizes being reckoned for Panfrost, where
said tiler heap buffer is initially allocated with a virtual size of 128
MiB, but only a small part of it will eventually be backed by system memory
after successive GPU page faults.

Provide a new DRM object generic function that would allow drivers to
return a more accurate RSS and purgeable sizes for their BOs.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/drm_file.c | 8 +---
 include/drm/drm_gem.h  | 9 +
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 883d83bc0e3d..9a1bd8d0d785 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -930,6 +930,8 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
spin_lock(>table_lock);
idr_for_each_entry (>object_idr, obj, id) {
enum drm_gem_object_status s = 0;
+   size_t add_size = (obj->funcs && obj->funcs->rss) ?
+   obj->funcs->rss(obj) : obj->size;
 
if (obj->funcs && obj->funcs->status) {
s = obj->funcs->status(obj);
@@ -944,7 +946,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
 
if (s & DRM_GEM_OBJECT_RESIDENT) {
-   status.resident += obj->size;
+   status.resident += add_size;
} else {
/* If already purged or not yet backed by pages, don't
 * count it as purgeable:
@@ -953,14 +955,14 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
 
if (!dma_resv_test_signaled(obj->resv, 
dma_resv_usage_rw(true))) {
-   status.active += obj->size;
+   status.active += add_size;
 
/* If still active, don't count as purgeable: */
s &= ~DRM_GEM_OBJECT_PURGEABLE;
}
 
if (s & DRM_GEM_OBJECT_PURGEABLE)
-   status.purgeable += obj->size;
+   status.purgeable += add_size;
}
spin_unlock(>table_lock);
 
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bc9f6aa2f3fe..16364487fde9 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
 */
enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
 
+   /**
+* @rss:
+*
+* Return resident size of the object in physical memory.
+*
+* Called by drm_show_memory_stats().
+*/
+   size_t (*rss)(struct drm_gem_object *obj);
+
/**
 * @vm_ops:
 *
-- 
2.42.0

[PATCH v7 5/5] drm/panfrost: Implement generic DRM object RSS reporting function

2023-09-27 Thread Adrián Larumbe

BO's RSS is updated every time new pages are allocated on demand and mapped
for the object at GPU page fault's IRQ handler, but only for heap buffers.
The reason this is unnecessary for non-heap buffers is that they are mapped
onto the GPU's VA space and backed by physical memory in their entirety at
BO creation time.

This calculation is unnecessary for imported PRIME objects, since heap
buffers cannot be exported by our driver, and the actual BO RSS size is the
one reported in its attached dmabuf structure.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 3 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 7d8f83d20539..4365434b48db 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -208,6 +208,20 @@ static enum drm_gem_object_status 
panfrost_gem_status(struct drm_gem_object *obj
return res;
 }
 
+static size_t panfrost_gem_rss(struct drm_gem_object *obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+
+   if (bo->is_heap) {
+   return bo->heap_rss_size;
+   } else if (bo->base.pages) {
+   WARN_ON(bo->heap_rss_size);
+   return bo->base.base.size;
+   } else {
+   return 0;
+   }
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -220,6 +234,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
.status = panfrost_gem_status,
+   .rss = panfrost_gem_rss,
.vm_ops = _gem_shmem_vm_ops,
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h 
b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ad2877eeeccd..13c0a8149c3a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -36,6 +36,11 @@ struct panfrost_gem_object {
 */
atomic_t gpu_usecount;
 
+   /*
+* Object chunk size currently mapped onto physical memory
+*/
+   size_t heap_rss_size;
+
bool noexec :1;
bool is_heap:1;
 };
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index d54d4e7b2195..846dd697c410 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -522,6 +522,7 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
 
bomapping->active = true;
+   bo->heap_rss_size += SZ_2M;
 
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
 
-- 
2.42.0

[PATCH v7 0/5] Add fdinfo support to Panfrost

2023-09-27 Thread Adrián Larumbe

This patch series adds fdinfo support to the Panfrost DRM driver. It will
display a series of key:value pairs under /proc/pid/fdinfo/fd for render
processes that open the Panfrost DRM file.

The pairs contain basic drm gpu engine and memory region information that
can either be cat by a privileged user or accessed with IGT's gputop
utility.

Changelog:

v1: https://lore.kernel.org/lkml/bb52b872-e41b-3894-285e-b52cfc849...@arm.com/T/

v2: https://lore.kernel.org/lkml/20230901084457.5bc1a...@collabora.com/T/
 - Changed the way gpu cycles and engine time are calculated, using GPU
   registers and taking into account potential resets.
 - Split render engine values into fragment and vertex/tiler ones.
 - Added more fine-grained calculation of RSS size for BO's.
 - Implemente selection of drm-memory region size units.
 - Removed locking of shrinker's mutex in GEM obj status function.

v3: 
https://lore.kernel.org/lkml/20230905184533.959171-1-adrian.laru...@collabora.com/
 - Changed fdinfo engine names to something more descriptive.;
 - Mentioned GPU cycle counts aren't an exact measure.
 - Handled the case when job->priv might be NULL.
 - Handled 32 bit overflow of cycle register.
 - Kept fdinfo drm memory stats size unit display within 10k times the
   previous multiplier for more accurate BO size numbers.
 - Removed special handling of Prime imported BO RSS.
 - Use rss_size only for heap objects.
 - Use bo->base.madv instead of specific purgeable flag.
 - Fixed kernel test robot warnings.

v4: 
https://lore.kernel.org/lkml/20230912084044.955864-1-adrian.laru...@collabora.com/
 - Move cycle counter get and put to panfrost_job_hw_submit and
   panfrost_job_handle_{err,done} for more accuracy.
 - Make sure cycle counter refs are released in reset path
 - Drop the model param for toggling cycle counting and do
   leave it down to the debugfs file.
 - Don't disable cycle counter when togglint debugfs file,
   let refcounting logic handle it instead.
 - Remove fdinfo data nested structure definion and 'names' field
 - When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume
   granuality of 2MiB for every successful mapping.
 - drm-file picks an fdinfo memory object size unit that doesn't lose precision.

v5: 
https://lore.kernel.org/lkml/20230914223928.2374933-1-adrian.laru...@collabora.com/
 - Removed explicit initialisation of atomic variable for profiling mode,
   as it's allocated with kzalloc.
 - Pass engine utilisation structure to jobs rather than the file context, to 
avoid
   future misusage of the latter.
 - Remove double reading of cycle counter register and ktime in job deqeueue 
function,
   as the scheduler will make sure these values are read over in case of 
requeuing.
 - Moved putting of cycle counting refcnt into panfrost job dequeue.
   function to avoid repetition.

v6: 
https://lore.kernel.org/lkml/c73ad42b-a8db-23c2-86c7-1a2939dba...@linux.intel.com/T/
 - Fix wrong swapped-round engine time and cycle values in fdinfo
   drm print statements.

v7:
 - Make sure an object's actual RSS size is added to the overall fdinfo's 
purgeable
   and active size tally when it's both resident and purgeable or active.
 - Create a drm/panfrost.rst documentation file with meaning of fdinfo strings.
 - BUILD_BUG_ON checking the engine name array size for fdinfo.
 - Added copyright notices for Amazon in Panfrost's new debugfs files.
 - Discarded fdinfo memory stats unit size selection patch.

Adrián Larumbe (5):
  drm/panfrost: Add cycle count GPU register definitions
  drm/panfrost: Add fdinfo support GPU load metrics
  drm/panfrost: Add fdinfo support for memory stats
  drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
  drm/panfrost: Implement generic DRM object RSS reporting function

 Documentation/gpu/drm-usage-stats.rst   |  1 +
 Documentation/gpu/panfrost.rst  | 38 +
 drivers/gpu/drm/drm_file.c  |  8 +--
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 60 -
 drivers/gpu/drm/panfrost/panfrost_gem.c | 29 ++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  4 ++
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 +
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 drivers/gpu/drm/panfrost/panfrost_regs.h|  5 ++
 include/drm/drm_gem.h   |  9 
 20 files changed, 289 insertions(+), 4 deletions(-)
 create mode 100644 Documentati

[PATCH v7 1/5] drm/panfrost: Add cycle count GPU register definitions

2023-09-27 Thread Adrián Larumbe

These GPU registers will be used when programming the cycle counter, which
we need for providing accurate fdinfo drm-cycles values to user space.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 919f44ac853d..55ec807550b3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -46,6 +46,8 @@
 #define   GPU_CMD_SOFT_RESET   0x01
 #define   GPU_CMD_PERFCNT_CLEAR0x03
 #define   GPU_CMD_PERFCNT_SAMPLE   0x04
+#define   GPU_CMD_CYCLE_COUNT_START0x05
+#define   GPU_CMD_CYCLE_COUNT_STOP 0x06
 #define   GPU_CMD_CLEAN_CACHES 0x07
 #define   GPU_CMD_CLEAN_INV_CACHES 0x08
 #define GPU_STATUS 0x34
@@ -73,6 +75,9 @@
 #define GPU_PRFCNT_TILER_EN0x74
 #define GPU_PRFCNT_MMU_L2_EN   0x7c
 
+#define GPU_CYCLE_COUNT_LO 0x90
+#define GPU_CYCLE_COUNT_HI 0x94
+
 #define GPU_THREAD_MAX_THREADS 0x0A0   /* (RO) Maximum number of 
threads per core */
 #define GPU_THREAD_MAX_WORKGROUP_SIZE  0x0A4   /* (RO) Maximum workgroup size 
*/
 #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8   /* (RO) Maximum threads waiting 
at a barrier */
-- 
2.42.0

[PATCH v7 3/5] drm/panfrost: Add fdinfo support for memory stats

2023-09-27 Thread Adrián Larumbe

A new DRM GEM object function is added so that drm_show_memory_stats can
provide more accurate memory usage numbers.

Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
after locking the driver's shrinker mutex, but drm_show_memory_stats takes
over the drm file's object handle database spinlock, so there's potential
for a race condition here.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 97e5bc4a82c8..b834777b409b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -568,6 +568,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
struct panfrost_device *pfdev = dev->dev_private;
 
panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+
+   drm_show_memory_stats(p, file);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 3c812fbd126f..7d8f83d20539 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -195,6 +195,19 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
return drm_gem_shmem_pin(>base);
 }
 
+static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object 
*obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   res |= (bo->base.madv == PANFROST_MADV_DONTNEED) ?
+   DRM_GEM_OBJECT_PURGEABLE : 0;
+
+   res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0;
+
+   return res;
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -206,6 +219,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
+   .status = panfrost_gem_status,
.vm_ops = _gem_shmem_vm_ops,
 };
 
-- 
2.42.0

Re: [PATCH v6 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

2023-09-22 Thread Adrián Larumbe

On 21.09.2023 11:14, Tvrtko Ursulin wrote:
>
>On 20/09/2023 16:32, Tvrtko Ursulin wrote:
>> 
>> On 20/09/2023 00:34, Adrián Larumbe wrote:
>> > The current implementation will try to pick the highest available size
>> > display unit as soon as the BO size exceeds that of the previous
>> > multiplier. That can lead to loss of precision in contexts of low memory
>> > usage.
>> > 
>> > The new selection criteria try to preserve precision, whilst also
>> > increasing the display unit selection threshold to render more accurate
>> > values.
>> > 
>> > Signed-off-by: Adrián Larumbe 
>> > Reviewed-by: Boris Brezillon 
>> > Reviewed-by: Steven Price 
>> > ---
>> >   drivers/gpu/drm/drm_file.c | 5 -
>> >   1 file changed, 4 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> > index 762965e3d503..34cfa128ffe5 100644
>> > --- a/drivers/gpu/drm/drm_file.c
>> > +++ b/drivers/gpu/drm/drm_file.c
>> > @@ -872,6 +872,8 @@ void drm_send_event(struct drm_device *dev, struct
>> > drm_pending_event *e)
>> >   }
>> >   EXPORT_SYMBOL(drm_send_event);
>> > +#define UPPER_UNIT_THRESHOLD 100
>> > +
>> >   static void print_size(struct drm_printer *p, const char *stat,
>> >  const char *region, u64 sz)
>> >   {
>> > @@ -879,7 +881,8 @@ static void print_size(struct drm_printer *p,
>> > const char *stat,
>> >   unsigned u;
>> >   for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
>> > -    if (sz < SZ_1K)
>> > +    if ((sz & (SZ_1K - 1)) &&
>> 
>> IS_ALIGNED worth it at all?
>> 
>> > +    sz < UPPER_UNIT_THRESHOLD * SZ_1K)
>> >   break;
>> 
>> Excuse me for a late comment (I was away). I did not get what what is
>> special about a ~10% threshold? Sounds to me just going with the lower
>> unit, when size is not aligned to the higher one, would be better than
>> sometimes precision-sometimes-not.
>
>FWIW both current and the threshold option make testing the feature very
>annoying.

How so?

>So I'd really propose we simply use smaller unit when unaligned.

Like I said in the previous reply, for drm files whose overall BO size sum is 
enormous
but not a multiple of a MiB, this would render huge number representations in 
KiB.
I don't find this particularly comfortable to read, and then this extra 
precision
would mean nothing to nvtop or gputop, which would have to scale the size to 
their
available screen dimensions when plotting them.

>Regards,
>
>Tvrtko

Re: [PATCH v6 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

2023-09-22 Thread Adrián Larumbe

On 20.09.2023 16:32, Tvrtko Ursulin wrote:
>
>On 20/09/2023 00:34, Adrián Larumbe wrote:
>> The current implementation will try to pick the highest available size
>> display unit as soon as the BO size exceeds that of the previous
>> multiplier. That can lead to loss of precision in contexts of low memory
>> usage.
>> 
>> The new selection criteria try to preserve precision, whilst also
>> increasing the display unit selection threshold to render more accurate
>> values.
>> 
>> Signed-off-by: Adrián Larumbe 
>> Reviewed-by: Boris Brezillon 
>> Reviewed-by: Steven Price 
>> ---
>>   drivers/gpu/drm/drm_file.c | 5 -
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index 762965e3d503..34cfa128ffe5 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -872,6 +872,8 @@ void drm_send_event(struct drm_device *dev, struct 
>> drm_pending_event *e)
>>   }
>>   EXPORT_SYMBOL(drm_send_event);
>> +#define UPPER_UNIT_THRESHOLD 100
>> +
>>   static void print_size(struct drm_printer *p, const char *stat,
>> const char *region, u64 sz)
>>   {
>> @@ -879,7 +881,8 @@ static void print_size(struct drm_printer *p, const char 
>> *stat,
>>  unsigned u;
>>  for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
>> -if (sz < SZ_1K)
>> +if ((sz & (SZ_1K - 1)) &&
>
>IS_ALIGNED worth it at all?

This could look better, yeah.

>> +sz < UPPER_UNIT_THRESHOLD * SZ_1K)
>>  break;
>
>Excuse me for a late comment (I was away). I did not get what what is special
>about a ~10% threshold? Sounds to me just going with the lower unit, when size
>is not aligned to the higher one, would be better than sometimes
>precision-sometimes-not.

We had a bit of a debate over this in previous revisions of the patch. It all 
began
when a Panfrost user complained that for relatively small BOs, they were losing
precision in the fdinfo file because the sum of the sizes of all BOs for a drm 
file
was in the order of MiBs, but not big enough to warrant losing accuracy when
plotting them on nvtop or gputop.

At first I thought of letting drivers pick their own preferred unit, but this 
would
lead to inconsistency in the units presented in the fdinfo file across different
DRM devices. Rob then suggested imposing a unit multiple threshold, while Boris
made the suggestion of checking for unit size alignment to lessen precision 
loss.

In the end Rob thought that minding both constraints was a good solution of 
compromise.

The unit threshold was picked sort of arbitrarily, and suggested by Rob 
himself. The
point of having it is avoiding huge number representations for BO size tallies 
that
aren't aligned to the next unit, and also because BO size sums are scaled when
plotting them on a Y axis, so complete accuracy isn't a requirement.

>Regards,
>
>Tvrtko
>
>>  sz = div_u64(sz, SZ_1K);
>>  }

Adrian Larumbe

Re: [PATCH v6 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo

2023-09-22 Thread Adrián Larumbe

On 20.09.2023 16:53, Tvrtko Ursulin wrote:
>
>On 20/09/2023 00:34, Adrián Larumbe wrote:
>> Some BO's might be mapped onto physical memory chunkwise and on demand,
>> like Panfrost's tiler heap. In this case, even though the
>> drm_gem_shmem_object page array might already be allocated, only a very
>> small fraction of the BO is currently backed by system memory, but
>> drm_show_memory_stats will then proceed to add its entire virtual size to
>> the file's total resident size regardless.
>> 
>> This led to very unrealistic RSS sizes being reckoned for Panfrost, where
>> said tiler heap buffer is initially allocated with a virtual size of 128
>> MiB, but only a small part of it will eventually be backed by system memory
>> after successive GPU page faults.
>> 
>> Provide a new DRM object generic function that would allow drivers to
>> return a more accurate RSS size for their BOs.
>> 
>> Signed-off-by: Adrián Larumbe 
>> Reviewed-by: Boris Brezillon 
>> Reviewed-by: Steven Price 
>> ---
>>   drivers/gpu/drm/drm_file.c | 5 -
>>   include/drm/drm_gem.h  | 9 +
>>   2 files changed, 13 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index 883d83bc0e3d..762965e3d503 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, 
>> struct drm_file *file)
>>  }
>>  if (s & DRM_GEM_OBJECT_RESIDENT) {
>> -status.resident += obj->size;
>> +if (obj->funcs && obj->funcs->rss)
>> +status.resident += obj->funcs->rss(obj);
>> +else
>> +status.resident += obj->size;
>
>Presumably you'd want the same smaller size in both active and purgeable? Or
>you can end up with more in those two than in rss which would look odd.

I didn't think of this. I guess when an object is both resident and purgeable,
then its RSS and purgeable sizes should be the same.

>Also, alternative to adding a new callback could be adding multiple output
>parameters to the existing obj->func->status() which maybe ends up simpler due
>fewer callbacks?
>
>Like:
>
> s = obj->funcs->status(obj, _status, )
>
>And adjust the code flow to pick up the rss if driver signaled it supports
>reporting it.

I personally find having a separate object callback more readable in this case.
There's also the question of what output parameter value would be used as a 
token
that the relevant BO doesn't have an RSS different from its virtual
size. I guess '0' would be alright, but this is on the assumption that this
could never be a legitimate BO virtual size across all DRM drivers. I guess
most of them round the size up to the nearest page multiple at BO creation
time.

>
>Regards,
>
>Tvrtko
>
>>  } else {
>>  /* If already purged or not yet backed by pages, don't
>>   * count it as purgeable:
>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>> index bc9f6aa2f3fe..16364487fde9 100644
>> --- a/include/drm/drm_gem.h
>> +++ b/include/drm/drm_gem.h
>> @@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
>>   */
>>  enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
>> +/**
>> + * @rss:
>> + *
>> + * Return resident size of the object in physical memory.
>> + *
>> + * Called by drm_show_memory_stats().
>> + */
>> +size_t (*rss)(struct drm_gem_object *obj);
>> +
>>  /**
>>   * @vm_ops:
>>   *

Re: [PATCH v6 2/6] drm/panfrost: Add fdinfo support GPU load metrics

2023-09-22 Thread Adrián Larumbe

On 20.09.2023 16:40, Tvrtko Ursulin wrote:
>On 20/09/2023 00:34, Adrián Larumbe wrote:
>> The drm-stats fdinfo tags made available to user space are drm-engine,
>> drm-cycles, drm-max-freq and drm-curfreq, one per job slot.
>> 
>> This deviates from standard practice in other DRM drivers, where a single
>> set of key:value pairs is provided for the whole render engine. However,
>> Panfrost has separate queues for fragment and vertex/tiler jobs, so a
>> decision was made to calculate bus cycles and workload times separately.
>> 
>> Maximum operating frequency is calculated at devfreq initialisation time.
>> Current frequency is made available to user space because nvtop uses it
>> when performing engine usage calculations.
>> 
>> It is important to bear in mind that both GPU cycle and kernel time numbers
>> provided are at best rough estimations, and always reported in excess from
>> the actual figure because of two reasons:
>>   - Excess time because of the delay between the end of a job processing,
>> the subsequent job IRQ and the actual time of the sample.
>>   - Time spent in the engine queue waiting for the GPU to pick up the next
>> job.
>> 
>> To avoid race conditions during enablement/disabling, a reference counting
>> mechanism was introduced, and a job flag that tells us whether a given job
>> increased the refcount. This is necessary, because user space can toggle
>> cycle counting through a debugfs file, and a given job might have been in
>> flight by the time cycle counting was disabled.
>> 
>> The main goal of the debugfs cycle counter knob is letting tools like nvtop
>> or IGT's gputop switch it at any time, to avoid power waste in case no
>> engine usage measuring is necessary.
>> 
>> Signed-off-by: Adrián Larumbe 
>> Reviewed-by: Boris Brezillon 
>> Reviewed-by: Steven Price 
>> ---
>>   drivers/gpu/drm/panfrost/Makefile   |  2 +
>>   drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 
>>   drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 +
>>   drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
>>   drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
>>   drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
>>   drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
>>   drivers/gpu/drm/panfrost/panfrost_drv.c | 57 -
>>   drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++
>>   drivers/gpu/drm/panfrost/panfrost_gpu.h |  4 ++
>>   drivers/gpu/drm/panfrost/panfrost_job.c | 24 +
>>   drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
>>   12 files changed, 191 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
>>   create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h
>> 
>> diff --git a/drivers/gpu/drm/panfrost/Makefile 
>> b/drivers/gpu/drm/panfrost/Makefile
>> index 7da2b3f02ed9..2c01c1e7523e 100644
>> --- a/drivers/gpu/drm/panfrost/Makefile
>> +++ b/drivers/gpu/drm/panfrost/Makefile
>> @@ -12,4 +12,6 @@ panfrost-y := \
>>  panfrost_perfcnt.o \
>>  panfrost_dump.o
>> +panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o
>> +
>>   obj-$(CONFIG_DRM_PANFROST) += panfrost.o
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c 
>> b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
>> new file mode 100644
>> index ..cc14eccba206
>> --- /dev/null
>> +++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
>> @@ -0,0 +1,20 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright 2023 Collabora ltd. */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "panfrost_device.h"
>> +#include "panfrost_gpu.h"
>> +#include "panfrost_debugfs.h"
>> +
>> +void panfrost_debugfs_init(struct drm_minor *minor)
>> +{
>> +struct drm_device *dev = minor->dev;
>> +struct panfrost_device *pfdev = 
>> platform_get_drvdata(to_platform_device(dev->dev));
>> +
>> +debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, 
>> >profile_mode);
>> +}
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h 
>> b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
>> new file mode 100644
>> index ..db1c158bcf2f
>> --- /dev/null
>> +++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
>> @@ -0,0 +1,13 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>&g

[PATCH v6 0/6] Add fdinfo support to Panfrost

2023-09-19 Thread Adrián Larumbe

This patch series adds fdinfo support to the Panfrost DRM driver. It will
display a series of key:value pairs under /proc/pid/fdinfo/fd for render
processes that open the Panfrost DRM file.

The pairs contain basic drm gpu engine and memory region information that
can either be cat by a privileged user or accessed with IGT's gputop
utility.

Changelog:

v1: https://lore.kernel.org/lkml/bb52b872-e41b-3894-285e-b52cfc849...@arm.com/T/

v2: https://lore.kernel.org/lkml/20230901084457.5bc1a...@collabora.com/T/
 - Changed the way gpu cycles and engine time are calculated, using GPU
 registers and taking into account potential resets.
 - Split render engine values into fragment and vertex/tiler ones.
 - Added more fine-grained calculation of RSS size for BO's.
 - Implemente selection of drm-memory region size units
 - Removed locking of shrinker's mutex in GEM obj status function

v3: 
https://lore.kernel.org/lkml/20230905184533.959171-1-adrian.laru...@collabora.com/
 - Changed fdinfo engine names to something more descriptive
 - Mentioned GPU cycle counts aren't an exact measure
 - Handled the case when job->priv might be NULL
 - Handled 32 bit overflow of cycle register
 - Kept fdinfo drm memory stats size unit display within 10k times the
   previous multiplier for more accurate BO size numbers
 - Removed special handling of Prime imported BO RSS
 - Use rss_size only for heap objects
 - Use bo->base.madv instead of specific purgeable flag
 - Fixed kernel test robot warnings

v4: 
https://lore.kernel.org/lkml/20230912084044.955864-1-adrian.laru...@collabora.com/
 - Move cycle counter get and put to panfrost_job_hw_submit and
   panfrost_job_handle_{err,done} for more accuracy
 - Make sure cycle counter refs are released in reset path
 - Drop the model param for toggling cycle counting and do
   leave it down to the debugfs file
 - Don't disable cycle counter when togglint debugfs file,
   let refcounting logic handle it instead.
 - Remove fdinfo data nested structure definion and 'names' field
 - When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume
  granuality of 2MiB for every successful mapping.
 - drm-file picks an fdinfo memory object size unit that doesn't lose precision.

v5: 
https://lore.kernel.org/lkml/20230914223928.2374933-1-adrian.laru...@collabora.com/
 - Removed explicit initialisation of atomic variable for profiling mode,
   as it's allocated with kzalloc.
 - Pass engine utilisation structure to jobs rather than the file context, to 
avoid
   future misusage of the latter.
 - Remove double reading of cycle counter register and ktime in job deqeueue 
function,
   as the scheduler will make sure these values are read over in case of 
requeuing.
 - Moved putting of cycle counting refcnt into panfrost job dequeue
   function to avoid repetition.

v6:
 - Fix wrong swapped-round engine time and cycle values in fdinfo
   drm print statements.

Adrián Larumbe (6):
  drm/panfrost: Add cycle count GPU register definitions
  drm/panfrost: Add fdinfo support GPU load metrics
  drm/panfrost: Add fdinfo support for memory stats
  drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
  drm/panfrost: Implement generic DRM object RSS reporting function
  drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

 drivers/gpu/drm/drm_file.c  | 10 +++-
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 +++
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 59 -
 drivers/gpu/drm/panfrost/panfrost_gem.c | 29 ++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  4 ++
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 +
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 drivers/gpu/drm/panfrost/panfrost_regs.h|  5 ++
 include/drm/drm_gem.h   |  9 
 18 files changed, 250 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h


base-commit: f45acf7acf75921c0409d452f0165f51a19a74fd
-- 
2.42.0

[PATCH v6 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo

2023-09-19 Thread Adrián Larumbe

Some BO's might be mapped onto physical memory chunkwise and on demand,
like Panfrost's tiler heap. In this case, even though the
drm_gem_shmem_object page array might already be allocated, only a very
small fraction of the BO is currently backed by system memory, but
drm_show_memory_stats will then proceed to add its entire virtual size to
the file's total resident size regardless.

This led to very unrealistic RSS sizes being reckoned for Panfrost, where
said tiler heap buffer is initially allocated with a virtual size of 128
MiB, but only a small part of it will eventually be backed by system memory
after successive GPU page faults.

Provide a new DRM object generic function that would allow drivers to
return a more accurate RSS size for their BOs.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/drm_file.c | 5 -
 include/drm/drm_gem.h  | 9 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 883d83bc0e3d..762965e3d503 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
 
if (s & DRM_GEM_OBJECT_RESIDENT) {
-   status.resident += obj->size;
+   if (obj->funcs && obj->funcs->rss)
+   status.resident += obj->funcs->rss(obj);
+   else
+   status.resident += obj->size;
} else {
/* If already purged or not yet backed by pages, don't
 * count it as purgeable:
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bc9f6aa2f3fe..16364487fde9 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
 */
enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
 
+   /**
+* @rss:
+*
+* Return resident size of the object in physical memory.
+*
+* Called by drm_show_memory_stats().
+*/
+   size_t (*rss)(struct drm_gem_object *obj);
+
/**
 * @vm_ops:
 *
-- 
2.42.0

[PATCH v6 2/6] drm/panfrost: Add fdinfo support GPU load metrics

2023-09-19 Thread Adrián Larumbe

The drm-stats fdinfo tags made available to user space are drm-engine,
drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

This deviates from standard practice in other DRM drivers, where a single
set of key:value pairs is provided for the whole render engine. However,
Panfrost has separate queues for fragment and vertex/tiler jobs, so a
decision was made to calculate bus cycles and workload times separately.

Maximum operating frequency is calculated at devfreq initialisation time.
Current frequency is made available to user space because nvtop uses it
when performing engine usage calculations.

It is important to bear in mind that both GPU cycle and kernel time numbers
provided are at best rough estimations, and always reported in excess from
the actual figure because of two reasons:
 - Excess time because of the delay between the end of a job processing,
   the subsequent job IRQ and the actual time of the sample.
 - Time spent in the engine queue waiting for the GPU to pick up the next
   job.

To avoid race conditions during enablement/disabling, a reference counting
mechanism was introduced, and a job flag that tells us whether a given job
increased the refcount. This is necessary, because user space can toggle
cycle counting through a debugfs file, and a given job might have been in
flight by the time cycle counting was disabled.

The main goal of the debugfs cycle counter knob is letting tools like nvtop
or IGT's gputop switch it at any time, to avoid power waste in case no
engine usage measuring is necessary.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 57 -
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  4 ++
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 +
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 12 files changed, 191 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h

diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index 7da2b3f02ed9..2c01c1e7523e 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -12,4 +12,6 @@ panfrost-y := \
panfrost_perfcnt.o \
panfrost_dump.o
 
+panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o
+
 obj-$(CONFIG_DRM_PANFROST) += panfrost.o
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
new file mode 100644
index ..cc14eccba206
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2023 Collabora ltd. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "panfrost_device.h"
+#include "panfrost_gpu.h"
+#include "panfrost_debugfs.h"
+
+void panfrost_debugfs_init(struct drm_minor *minor)
+{
+   struct drm_device *dev = minor->dev;
+   struct panfrost_device *pfdev = 
platform_get_drvdata(to_platform_device(dev->dev));
+
+   debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, 
>profile_mode);
+}
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
new file mode 100644
index ..db1c158bcf2f
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2023 Collabora ltd.
+ */
+
+#ifndef PANFROST_DEBUGFS_H
+#define PANFROST_DEBUGFS_H
+
+#ifdef CONFIG_DEBUG_FS
+void panfrost_debugfs_init(struct drm_minor *minor);
+#endif
+
+#endif  /* PANFROST_DEBUGFS_H */
diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c 
b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
index 58dfb15a8757..28caffc689e2 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
@@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
spin_lock_irqsave(>lock, irqflags);
 
panfrost_devfreq_update_utilization(pfdevfreq);
+   pfdevfreq->current_frequency = status->current_frequency;
 
status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
   pfdevfreq->idle_time));
@@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
struct devfreq *devfreq;
struct therm

[PATCH v6 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

2023-09-19 Thread Adrián Larumbe

The current implementation will try to pick the highest available size
display unit as soon as the BO size exceeds that of the previous
multiplier. That can lead to loss of precision in contexts of low memory
usage.

The new selection criteria try to preserve precision, whilst also
increasing the display unit selection threshold to render more accurate
values.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/drm_file.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 762965e3d503..34cfa128ffe5 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -872,6 +872,8 @@ void drm_send_event(struct drm_device *dev, struct 
drm_pending_event *e)
 }
 EXPORT_SYMBOL(drm_send_event);
 
+#define UPPER_UNIT_THRESHOLD 100
+
 static void print_size(struct drm_printer *p, const char *stat,
   const char *region, u64 sz)
 {
@@ -879,7 +881,8 @@ static void print_size(struct drm_printer *p, const char 
*stat,
unsigned u;
 
for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
-   if (sz < SZ_1K)
+   if ((sz & (SZ_1K - 1)) &&
+   sz < UPPER_UNIT_THRESHOLD * SZ_1K)
break;
sz = div_u64(sz, SZ_1K);
}
-- 
2.42.0

[PATCH v6 5/6] drm/panfrost: Implement generic DRM object RSS reporting function

2023-09-19 Thread Adrián Larumbe

BO's RSS is updated every time new pages are allocated on demand and mapped
for the object at GPU page fault's IRQ handler, but only for heap buffers.
The reason this is unnecessary for non-heap buffers is that they are mapped
onto the GPU's VA space and backed by physical memory in their entirety at
BO creation time.

This calculation is unnecessary for imported PRIME objects, since heap
buffers cannot be exported by our driver, and the actual BO RSS size is the
one reported in its attached dmabuf structure.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 3 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 7d8f83d20539..4365434b48db 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -208,6 +208,20 @@ static enum drm_gem_object_status 
panfrost_gem_status(struct drm_gem_object *obj
return res;
 }
 
+static size_t panfrost_gem_rss(struct drm_gem_object *obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+
+   if (bo->is_heap) {
+   return bo->heap_rss_size;
+   } else if (bo->base.pages) {
+   WARN_ON(bo->heap_rss_size);
+   return bo->base.base.size;
+   } else {
+   return 0;
+   }
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -220,6 +234,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
.status = panfrost_gem_status,
+   .rss = panfrost_gem_rss,
.vm_ops = _gem_shmem_vm_ops,
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h 
b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ad2877eeeccd..13c0a8149c3a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -36,6 +36,11 @@ struct panfrost_gem_object {
 */
atomic_t gpu_usecount;
 
+   /*
+* Object chunk size currently mapped onto physical memory
+*/
+   size_t heap_rss_size;
+
bool noexec :1;
bool is_heap:1;
 };
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index d54d4e7b2195..7b1490cdaa48 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -522,6 +522,7 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
 
bomapping->active = true;
+   bo->heap_rss_size += SZ_2;
 
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
 
-- 
2.42.0

[PATCH v6 1/6] drm/panfrost: Add cycle count GPU register definitions

2023-09-19 Thread Adrián Larumbe

These GPU registers will be used when programming the cycle counter, which
we need for providing accurate fdinfo drm-cycles values to user space.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 919f44ac853d..55ec807550b3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -46,6 +46,8 @@
 #define   GPU_CMD_SOFT_RESET   0x01
 #define   GPU_CMD_PERFCNT_CLEAR0x03
 #define   GPU_CMD_PERFCNT_SAMPLE   0x04
+#define   GPU_CMD_CYCLE_COUNT_START0x05
+#define   GPU_CMD_CYCLE_COUNT_STOP 0x06
 #define   GPU_CMD_CLEAN_CACHES 0x07
 #define   GPU_CMD_CLEAN_INV_CACHES 0x08
 #define GPU_STATUS 0x34
@@ -73,6 +75,9 @@
 #define GPU_PRFCNT_TILER_EN0x74
 #define GPU_PRFCNT_MMU_L2_EN   0x7c
 
+#define GPU_CYCLE_COUNT_LO 0x90
+#define GPU_CYCLE_COUNT_HI 0x94
+
 #define GPU_THREAD_MAX_THREADS 0x0A0   /* (RO) Maximum number of 
threads per core */
 #define GPU_THREAD_MAX_WORKGROUP_SIZE  0x0A4   /* (RO) Maximum workgroup size 
*/
 #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8   /* (RO) Maximum threads waiting 
at a barrier */
-- 
2.42.0

[PATCH v6 3/6] drm/panfrost: Add fdinfo support for memory stats

2023-09-19 Thread Adrián Larumbe

A new DRM GEM object function is added so that drm_show_memory_stats can
provide more accurate memory usage numbers.

Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
after locking the driver's shrinker mutex, but drm_show_memory_stats takes
over the drm file's object handle database spinlock, so there's potential
for a race condition here.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 3c93a11deab1..8cd9331ac4b8 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -567,6 +567,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
struct panfrost_device *pfdev = dev->dev_private;
 
panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+
+   drm_show_memory_stats(p, file);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 3c812fbd126f..7d8f83d20539 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -195,6 +195,19 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
return drm_gem_shmem_pin(>base);
 }
 
+static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object 
*obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   res |= (bo->base.madv == PANFROST_MADV_DONTNEED) ?
+   DRM_GEM_OBJECT_PURGEABLE : 0;
+
+   res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0;
+
+   return res;
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -206,6 +219,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
+   .status = panfrost_gem_status,
.vm_ops = _gem_shmem_vm_ops,
 };
 
-- 
2.42.0

[PATCH v5 5/6] drm/panfrost: Implement generic DRM object RSS reporting function

2023-09-14 Thread Adrián Larumbe

BO's RSS is updated every time new pages are allocated on demand and mapped
for the object at GPU page fault's IRQ handler, but only for heap buffers.
The reason this is unnecessary for non-heap buffers is that they are mapped
onto the GPU's VA space and backed by physical memory in their entirety at
BO creation time.

This calculation is unnecessary for imported PRIME objects, since heap
buffers cannot be exported by our driver, and the actual BO RSS size is the
one reported in its attached dmabuf structure.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 3 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 7d8f83d20539..4365434b48db 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -208,6 +208,20 @@ static enum drm_gem_object_status 
panfrost_gem_status(struct drm_gem_object *obj
return res;
 }
 
+static size_t panfrost_gem_rss(struct drm_gem_object *obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+
+   if (bo->is_heap) {
+   return bo->heap_rss_size;
+   } else if (bo->base.pages) {
+   WARN_ON(bo->heap_rss_size);
+   return bo->base.base.size;
+   } else {
+   return 0;
+   }
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -220,6 +234,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
.status = panfrost_gem_status,
+   .rss = panfrost_gem_rss,
.vm_ops = _gem_shmem_vm_ops,
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h 
b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ad2877eeeccd..13c0a8149c3a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -36,6 +36,11 @@ struct panfrost_gem_object {
 */
atomic_t gpu_usecount;
 
+   /*
+* Object chunk size currently mapped onto physical memory
+*/
+   size_t heap_rss_size;
+
bool noexec :1;
bool is_heap:1;
 };
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index d54d4e7b2195..7b1490cdaa48 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -522,6 +522,7 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
 
bomapping->active = true;
+   bo->heap_rss_size += SZ_2;
 
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
 
-- 
2.42.0

[PATCH v5 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

2023-09-14 Thread Adrián Larumbe

The current implementation will try to pick the highest available size
display unit as soon as the BO size exceeds that of the previous
multiplier. That can lead to loss of precision in contexts of low memory
usage.

The new selection criteria try to preserve precision, whilst also
increasing the display unit selection threshold to render more accurate
values.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/drm_file.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 762965e3d503..34cfa128ffe5 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -872,6 +872,8 @@ void drm_send_event(struct drm_device *dev, struct 
drm_pending_event *e)
 }
 EXPORT_SYMBOL(drm_send_event);
 
+#define UPPER_UNIT_THRESHOLD 100
+
 static void print_size(struct drm_printer *p, const char *stat,
   const char *region, u64 sz)
 {
@@ -879,7 +881,8 @@ static void print_size(struct drm_printer *p, const char 
*stat,
unsigned u;
 
for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
-   if (sz < SZ_1K)
+   if ((sz & (SZ_1K - 1)) &&
+   sz < UPPER_UNIT_THRESHOLD * SZ_1K)
break;
sz = div_u64(sz, SZ_1K);
}
-- 
2.42.0

[PATCH v5 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo

2023-09-14 Thread Adrián Larumbe

Some BO's might be mapped onto physical memory chunkwise and on demand,
like Panfrost's tiler heap. In this case, even though the
drm_gem_shmem_object page array might already be allocated, only a very
small fraction of the BO is currently backed by system memory, but
drm_show_memory_stats will then proceed to add its entire virtual size to
the file's total resident size regardless.

This led to very unrealistic RSS sizes being reckoned for Panfrost, where
said tiler heap buffer is initially allocated with a virtual size of 128
MiB, but only a small part of it will eventually be backed by system memory
after successive GPU page faults.

Provide a new DRM object generic function that would allow drivers to
return a more accurate RSS size for their BOs.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
---
 drivers/gpu/drm/drm_file.c | 5 -
 include/drm/drm_gem.h  | 9 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 883d83bc0e3d..762965e3d503 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
 
if (s & DRM_GEM_OBJECT_RESIDENT) {
-   status.resident += obj->size;
+   if (obj->funcs && obj->funcs->rss)
+   status.resident += obj->funcs->rss(obj);
+   else
+   status.resident += obj->size;
} else {
/* If already purged or not yet backed by pages, don't
 * count it as purgeable:
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bc9f6aa2f3fe..16364487fde9 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
 */
enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
 
+   /**
+* @rss:
+*
+* Return resident size of the object in physical memory.
+*
+* Called by drm_show_memory_stats().
+*/
+   size_t (*rss)(struct drm_gem_object *obj);
+
/**
 * @vm_ops:
 *
-- 
2.42.0

[PATCH v5 1/6] drm/panfrost: Add cycle count GPU register definitions

2023-09-14 Thread Adrián Larumbe

These GPU registers will be used when programming the cycle counter, which
we need for providing accurate fdinfo drm-cycles values to user space.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 919f44ac853d..55ec807550b3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -46,6 +46,8 @@
 #define   GPU_CMD_SOFT_RESET   0x01
 #define   GPU_CMD_PERFCNT_CLEAR0x03
 #define   GPU_CMD_PERFCNT_SAMPLE   0x04
+#define   GPU_CMD_CYCLE_COUNT_START0x05
+#define   GPU_CMD_CYCLE_COUNT_STOP 0x06
 #define   GPU_CMD_CLEAN_CACHES 0x07
 #define   GPU_CMD_CLEAN_INV_CACHES 0x08
 #define GPU_STATUS 0x34
@@ -73,6 +75,9 @@
 #define GPU_PRFCNT_TILER_EN0x74
 #define GPU_PRFCNT_MMU_L2_EN   0x7c
 
+#define GPU_CYCLE_COUNT_LO 0x90
+#define GPU_CYCLE_COUNT_HI 0x94
+
 #define GPU_THREAD_MAX_THREADS 0x0A0   /* (RO) Maximum number of 
threads per core */
 #define GPU_THREAD_MAX_WORKGROUP_SIZE  0x0A4   /* (RO) Maximum workgroup size 
*/
 #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8   /* (RO) Maximum threads waiting 
at a barrier */
-- 
2.42.0

[PATCH v5 3/6] drm/panfrost: Add fdinfo support for memory stats

2023-09-14 Thread Adrián Larumbe

A new DRM GEM object function is added so that drm_show_memory_stats can
provide more accurate memory usage numbers.

Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
after locking the driver's shrinker mutex, but drm_show_memory_stats takes
over the drm file's object handle database spinlock, so there's potential
for a race condition here.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index a8d02273afab..ef6563cf5f7e 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -567,6 +567,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
struct panfrost_device *pfdev = dev->dev_private;
 
panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+
+   drm_show_memory_stats(p, file);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 3c812fbd126f..7d8f83d20539 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -195,6 +195,19 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
return drm_gem_shmem_pin(>base);
 }
 
+static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object 
*obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   res |= (bo->base.madv == PANFROST_MADV_DONTNEED) ?
+   DRM_GEM_OBJECT_PURGEABLE : 0;
+
+   res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0;
+
+   return res;
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -206,6 +219,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
+   .status = panfrost_gem_status,
.vm_ops = _gem_shmem_vm_ops,
 };
 
-- 
2.42.0

[PATCH v5 2/6] drm/panfrost: Add fdinfo support GPU load metrics

2023-09-14 Thread Adrián Larumbe

The drm-stats fdinfo tags made available to user space are drm-engine,
drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

This deviates from standard practice in other DRM drivers, where a single
set of key:value pairs is provided for the whole render engine. However,
Panfrost has separate queues for fragment and vertex/tiler jobs, so a
decision was made to calculate bus cycles and workload times separately.

Maximum operating frequency is calculated at devfreq initialisation time.
Current frequency is made available to user space because nvtop uses it
when performing engine usage calculations.

It is important to bear in mind that both GPU cycle and kernel time numbers
provided are at best rough estimations, and always reported in excess from
the actual figure because of two reasons:
 - Excess time because of the delay between the end of a job processing,
   the subsequent job IRQ and the actual time of the sample.
 - Time spent in the engine queue waiting for the GPU to pick up the next
   job.

To avoid race conditions during enablement/disabling, a reference counting
mechanism was introduced, and a job flag that tells us whether a given job
increased the refcount. This is necessary, because user space can toggle
cycle counting through a debugfs file, and a given job might have been in
flight by the time cycle counting was disabled.

The main goal of the debugfs cycle counter knob is letting tools like nvtop
or IGT's gputop switch it at any time, to avoid power waste in case no
engine usage measuring is necessary.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
---
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 57 -
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  4 ++
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 +
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 12 files changed, 191 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h

diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index 7da2b3f02ed9..2c01c1e7523e 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -12,4 +12,6 @@ panfrost-y := \
panfrost_perfcnt.o \
panfrost_dump.o
 
+panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o
+
 obj-$(CONFIG_DRM_PANFROST) += panfrost.o
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
new file mode 100644
index ..cc14eccba206
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2023 Collabora ltd. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "panfrost_device.h"
+#include "panfrost_gpu.h"
+#include "panfrost_debugfs.h"
+
+void panfrost_debugfs_init(struct drm_minor *minor)
+{
+   struct drm_device *dev = minor->dev;
+   struct panfrost_device *pfdev = 
platform_get_drvdata(to_platform_device(dev->dev));
+
+   debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, 
>profile_mode);
+}
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
new file mode 100644
index ..db1c158bcf2f
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2023 Collabora ltd.
+ */
+
+#ifndef PANFROST_DEBUGFS_H
+#define PANFROST_DEBUGFS_H
+
+#ifdef CONFIG_DEBUG_FS
+void panfrost_debugfs_init(struct drm_minor *minor);
+#endif
+
+#endif  /* PANFROST_DEBUGFS_H */
diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c 
b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
index 58dfb15a8757..28caffc689e2 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
@@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
spin_lock_irqsave(>lock, irqflags);
 
panfrost_devfreq_update_utilization(pfdevfreq);
+   pfdevfreq->current_frequency = status->current_frequency;
 
status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
   pfdevfreq->idle_time));
@@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
struct devfreq *devfreq;
struct thermal_cooling_de

[PATCH v5 0/6] Add fdinfo support to Panfrost

2023-09-14 Thread Adrián Larumbe

This patch series adds fdinfo support to the Panfrost DRM driver. It will
display a series of key:value pairs under /proc/pid/fdinfo/fd for render
processes that open the Panfrost DRM file.

The pairs contain basic drm gpu engine and memory region information that
can either be cat by a privileged user or accessed with IGT's gputop
utility.

Changelog:

v1: https://lore.kernel.org/lkml/bb52b872-e41b-3894-285e-b52cfc849...@arm.com/T/

v2: https://lore.kernel.org/lkml/20230901084457.5bc1a...@collabora.com/T/
 - Changed the way gpu cycles and engine time are calculated, using GPU
 registers and taking into account potential resets.
 - Split render engine values into fragment and vertex/tiler ones.
 - Added more fine-grained calculation of RSS size for BO's.
 - Implemente selection of drm-memory region size units
 - Removed locking of shrinker's mutex in GEM obj status function

v3: 
https://lore.kernel.org/lkml/20230905184533.959171-1-adrian.laru...@collabora.com/
 - Changed fdinfo engine names to something more descriptive
 - Mentioned GPU cycle counts aren't an exact measure
 - Handled the case when job->priv might be NULL
 - Handled 32 bit overflow of cycle register
 - Kept fdinfo drm memory stats size unit display within 10k times the
   previous multiplier for more accurate BO size numbers
 - Removed special handling of Prime imported BO RSS
 - Use rss_size only for heap objects
 - Use bo->base.madv instead of specific purgeable flag
 - Fixed kernel test robot warnings

v4: 
https://lore.kernel.org/lkml/20230912084044.955864-1-adrian.laru...@collabora.com/
 - Move cycle counter get and put to panfrost_job_hw_submit and
   panfrost_job_handle_{err,done} for more accuracy
 - Make sure cycle counter refs are released in reset path
 - Drop the model param for toggling cycle counting and do
   leave it down to the debugfs file
 - Don't disable cycle counter when togglint debugfs file,
   let refcounting logic handle it instead.
 - Remove fdinfo data nested structure definion and 'names' field
 - When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume
  granuality of 2MiB for every successful mapping.
 - drm-file picks an fdinfo memory object size unit that doesn't lose precision.

v5:
 - Removed explicit initialisation of atomic variable for profiling mode,
   as it's allocated with kzalloc.
 - Pass engine utilisation structure to jobs rather than the file context, to 
avoid
   future misusage of the latter.
 - Remove double reading of cycle counter register and ktime in job deqeueue 
function,
   as the scheduler will make sure these values are read over in case of 
requeuing.
 - Moved putting of cycle counting refcnt into panfrost job dequeue
   function to avoid repetition.

Adrián Larumbe (6):
  drm/panfrost: Add cycle count GPU register definitions
  drm/panfrost: Add fdinfo support GPU load metrics
  drm/panfrost: Add fdinfo support for memory stats
  drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
  drm/panfrost: Implement generic DRM object RSS reporting function
  drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

 drivers/gpu/drm/drm_file.c  | 10 +++-
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 +++
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 59 -
 drivers/gpu/drm/panfrost/panfrost_gem.c | 29 ++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  4 ++
 drivers/gpu/drm/panfrost/panfrost_job.c | 24 +
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 drivers/gpu/drm/panfrost/panfrost_regs.h|  5 ++
 include/drm/drm_gem.h   |  9 
 18 files changed, 250 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h


base-commit: f45acf7acf75921c0409d452f0165f51a19a74fd
-- 
2.42.0

[PATCH v4 3/6] drm/panfrost: Add fdinfo support for memory stats

2023-09-12 Thread Adrián Larumbe

A new DRM GEM object function is added so that drm_show_memory_stats can
provide more accurate memory usage numbers.

Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked
after locking the driver's shrinker mutex, but drm_show_memory_stats takes
over the drm file's object handle database spinlock, so there's potential
for a race condition here.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c |  2 ++
 drivers/gpu/drm/panfrost/panfrost_gem.c | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 2d9c115821a7..e71a89a283cd 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -567,6 +567,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, 
struct drm_file *file)
struct panfrost_device *pfdev = dev->dev_private;
 
panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p);
+
+   drm_show_memory_stats(p, file);
 }
 
 static const struct file_operations panfrost_drm_driver_fops = {
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 3c812fbd126f..7d8f83d20539 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -195,6 +195,19 @@ static int panfrost_gem_pin(struct drm_gem_object *obj)
return drm_gem_shmem_pin(>base);
 }
 
+static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object 
*obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+   enum drm_gem_object_status res = 0;
+
+   res |= (bo->base.madv == PANFROST_MADV_DONTNEED) ?
+   DRM_GEM_OBJECT_PURGEABLE : 0;
+
+   res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0;
+
+   return res;
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -206,6 +219,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vmap = drm_gem_shmem_object_vmap,
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
+   .status = panfrost_gem_status,
.vm_ops = _gem_shmem_vm_ops,
 };
 
-- 
2.42.0

[PATCH v4 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

2023-09-12 Thread Adrián Larumbe

The current implementation will try to pick the highest available size
display unit as soon as the BO size exceeds that of the previous
multiplier. That can lead to loss of precision in BO's whose size is
not a multiple of a MiB.

Fix it by changing the unit selection criteria.

For much bigger BO's, their size will naturally be aligned on something
bigger than a 4 KiB page, so in practice it is very unlikely their display
unit would default to KiB.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/drm_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 762965e3d503..bf7d2fe46bfa 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -879,7 +879,7 @@ static void print_size(struct drm_printer *p, const char 
*stat,
unsigned u;
 
for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
-   if (sz < SZ_1K)
+   if (sz & (SZ_1K - 1))
break;
sz = div_u64(sz, SZ_1K);
}
-- 
2.42.0

[PATCH v4 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo

2023-09-12 Thread Adrián Larumbe

Some BO's might be mapped onto physical memory chunkwise and on demand,
like Panfrost's tiler heap. In this case, even though the
drm_gem_shmem_object page array might already be allocated, only a very
small fraction of the BO is currently backed by system memory, but
drm_show_memory_stats will then proceed to add its entire virtual size to
the file's total resident size regardless.

This led to very unrealistic RSS sizes being reckoned for Panfrost, where
said tiler heap buffer is initially allocated with a virtual size of 128
MiB, but only a small part of it will eventually be backed by system memory
after successive GPU page faults.

Provide a new DRM object generic function that would allow drivers to
return a more accurate RSS size for their BOs.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
---
 drivers/gpu/drm/drm_file.c | 5 -
 include/drm/drm_gem.h  | 9 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 883d83bc0e3d..762965e3d503 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
}
 
if (s & DRM_GEM_OBJECT_RESIDENT) {
-   status.resident += obj->size;
+   if (obj->funcs && obj->funcs->rss)
+   status.resident += obj->funcs->rss(obj);
+   else
+   status.resident += obj->size;
} else {
/* If already purged or not yet backed by pages, don't
 * count it as purgeable:
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index bc9f6aa2f3fe..16364487fde9 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -208,6 +208,15 @@ struct drm_gem_object_funcs {
 */
enum drm_gem_object_status (*status)(struct drm_gem_object *obj);
 
+   /**
+* @rss:
+*
+* Return resident size of the object in physical memory.
+*
+* Called by drm_show_memory_stats().
+*/
+   size_t (*rss)(struct drm_gem_object *obj);
+
/**
 * @vm_ops:
 *
-- 
2.42.0

[PATCH v4 2/6] drm/panfrost: Add fdinfo support GPU load metrics

2023-09-12 Thread Adrián Larumbe

The drm-stats fdinfo tags made available to user space are drm-engine,
drm-cycles, drm-max-freq and drm-curfreq, one per job slot.

This deviates from standard practice in other DRM drivers, where a single
set of key:value pairs is provided for the whole render engine. However,
Panfrost has separate queues for fragment and vertex/tiler jobs, so a
decision was made to calculate bus cycles and workload times separately.

Maximum operating frequency is calculated at devfreq initialisation time.
Current frequency is made available to user space because nvtop uses it
when performing engine usage calculations.

It is important to bear in mind that both GPU cycle and kernel time numbers
provided are at best rough estimations, and always reported in excess from
the actual figure because of two reasons:
 - Excess time because of the delay between the end of a job processing,
   the subsequent job IRQ and the actual time of the sample.
 - Time spent in the engine queue waiting for the GPU to pick up the next
   job.

To avoid race conditions during enablement/disabling, a reference counting
mechanism was introduced, and a job flag that tells us whether a given job
increased the refcount. This is necessary, because user space can toggle
cycle counting through a debugfs file, and a given job might have been in
flight by the time cycle counting was disabled.

The main goal of the debugfs cycle counter knob is letting tools like nvtop
or IGT's gputop switch it at any time, to avoid power waste in case no
engine usage measuring is necessary.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 57 -
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  6 +++
 drivers/gpu/drm/panfrost/panfrost_job.c | 39 ++
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 12 files changed, 208 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h

diff --git a/drivers/gpu/drm/panfrost/Makefile 
b/drivers/gpu/drm/panfrost/Makefile
index 7da2b3f02ed9..2c01c1e7523e 100644
--- a/drivers/gpu/drm/panfrost/Makefile
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -12,4 +12,6 @@ panfrost-y := \
panfrost_perfcnt.o \
panfrost_dump.o
 
+panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o
+
 obj-$(CONFIG_DRM_PANFROST) += panfrost.o
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
new file mode 100644
index ..cc14eccba206
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.c
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2023 Collabora ltd. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "panfrost_device.h"
+#include "panfrost_gpu.h"
+#include "panfrost_debugfs.h"
+
+void panfrost_debugfs_init(struct drm_minor *minor)
+{
+   struct drm_device *dev = minor->dev;
+   struct panfrost_device *pfdev = 
platform_get_drvdata(to_platform_device(dev->dev));
+
+   debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, 
>profile_mode);
+}
diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h 
b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
new file mode 100644
index ..db1c158bcf2f
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2023 Collabora ltd.
+ */
+
+#ifndef PANFROST_DEBUGFS_H
+#define PANFROST_DEBUGFS_H
+
+#ifdef CONFIG_DEBUG_FS
+void panfrost_debugfs_init(struct drm_minor *minor);
+#endif
+
+#endif  /* PANFROST_DEBUGFS_H */
diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c 
b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
index 58dfb15a8757..28caffc689e2 100644
--- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c
+++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c
@@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev,
spin_lock_irqsave(>lock, irqflags);
 
panfrost_devfreq_update_utilization(pfdevfreq);
+   pfdevfreq->current_frequency = status->current_frequency;
 
status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time,
   pfdevfreq->idle_time));
@@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev)
struct devfreq *devfreq;
struct thermal_cooling_device *cooling;
struct panfrost_d

[PATCH v4 0/6] Add fdinfo support to Panfrost

2023-09-12 Thread Adrián Larumbe

This patch series adds fdinfo support to the Panfrost DRM driver. It will
display a series of key:value pairs under /proc/pid/fdinfo/fd for render
processes that open the Panfrost DRM file.

The pairs contain basic drm gpu engine and memory region information that
can either be cat by a privileged user or accessed with IGT's gputop
utility.

Changelog:

v1: https://lore.kernel.org/lkml/bb52b872-e41b-3894-285e-b52cfc849...@arm.com/T/

v2: https://lore.kernel.org/lkml/20230901084457.5bc1a...@collabora.com/T/
 - Changed the way gpu cycles and engine time are calculated, using GPU
 registers and taking into account potential resets.
 - Split render engine values into fragment and vertex/tiler ones.
 - Added more fine-grained calculation of RSS size for BO's.
 - Implemente selection of drm-memory region size units
 - Removed locking of shrinker's mutex in GEM obj status function

v3: 
https://lore.kernel.org/lkml/20230905184533.959171-1-adrian.laru...@collabora.com/
 - Changed fdinfo engine names to something more descriptive
 - Mentioned GPU cycle counts aren't an exact measure
 - Handled the case when job->priv might be NULL
 - Handled 32 bit overflow of cycle register
 - Kept fdinfo drm memory stats size unit display within 10k times the
   previous multiplier for more accurate BO size numbers
 - Removed special handling of Prime imported BO RSS
 - Use rss_size only for heap objects
 - Use bo->base.madv instead of specific purgeable flag
 - Fixed kernel test robot warnings

v4:
 - Move cycle counter get and put to panfrost_job_hw_submit and
   panfrost_job_handle_{err,done} for more accuracy
 - Make sure cycle counter refs are released in reset path
 - Drop the model param for toggling cycle counting and do
   leave it down to the debugfs file
 - Don't disable cycle counter when togglint debugfs file,
   let refcounting logic handle it instead.
 - Remove fdinfo data nested structure definion and 'names' field
 - When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume
  granuality of 2MiB for every successful mapping.
 - drm-file picks an fdinfo memory object size unit that doesn't lose precision.

Adrián Larumbe (6):
  drm/panfrost: Add cycle count GPU register definitions
  drm/panfrost: Add fdinfo support GPU load metrics
  drm/panfrost: Add fdinfo support for memory stats
  drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
  drm/panfrost: Implement generic DRM object RSS reporting function
  drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

 drivers/gpu/drm/drm_file.c  |  7 ++-
 drivers/gpu/drm/panfrost/Makefile   |  2 +
 drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 +++
 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 +
 drivers/gpu/drm/panfrost/panfrost_devfreq.c |  8 +++
 drivers/gpu/drm/panfrost/panfrost_devfreq.h |  3 ++
 drivers/gpu/drm/panfrost/panfrost_device.c  |  2 +
 drivers/gpu/drm/panfrost/panfrost_device.h  | 13 +
 drivers/gpu/drm/panfrost/panfrost_drv.c | 59 -
 drivers/gpu/drm/panfrost/panfrost_gem.c | 29 ++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++
 drivers/gpu/drm/panfrost/panfrost_gpu.h |  6 +++
 drivers/gpu/drm/panfrost/panfrost_job.c | 39 ++
 drivers/gpu/drm/panfrost/panfrost_job.h |  5 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 drivers/gpu/drm/panfrost/panfrost_regs.h|  5 ++
 include/drm/drm_gem.h   |  9 
 18 files changed, 264 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h


base-commit: f45acf7acf75921c0409d452f0165f51a19a74fd
-- 
2.42.0

[PATCH v4 5/6] drm/panfrost: Implement generic DRM object RSS reporting function

2023-09-12 Thread Adrián Larumbe

BO's RSS is updated every time new pages are allocated on demand and mapped
for the object at GPU page fault's IRQ handler, but only for heap buffers.
The reason this is unnecessary for non-heap buffers is that they are mapped
onto the GPU's VA space and backed by physical memory in their entirety at
BO creation time.

This calculation is unnecessary for imported PRIME objects, since heap
buffers cannot be exported by our driver, and the actual BO RSS size is the
one reported in its attached dmabuf structure.

Signed-off-by: Adrián Larumbe 
---
 drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++
 drivers/gpu/drm/panfrost/panfrost_gem.h |  5 +
 drivers/gpu/drm/panfrost/panfrost_mmu.c |  1 +
 3 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c 
b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 7d8f83d20539..4365434b48db 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -208,6 +208,20 @@ static enum drm_gem_object_status 
panfrost_gem_status(struct drm_gem_object *obj
return res;
 }
 
+static size_t panfrost_gem_rss(struct drm_gem_object *obj)
+{
+   struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+
+   if (bo->is_heap) {
+   return bo->heap_rss_size;
+   } else if (bo->base.pages) {
+   WARN_ON(bo->heap_rss_size);
+   return bo->base.base.size;
+   } else {
+   return 0;
+   }
+}
+
 static const struct drm_gem_object_funcs panfrost_gem_funcs = {
.free = panfrost_gem_free_object,
.open = panfrost_gem_open,
@@ -220,6 +234,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs 
= {
.vunmap = drm_gem_shmem_object_vunmap,
.mmap = drm_gem_shmem_object_mmap,
.status = panfrost_gem_status,
+   .rss = panfrost_gem_rss,
.vm_ops = _gem_shmem_vm_ops,
 };
 
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h 
b/drivers/gpu/drm/panfrost/panfrost_gem.h
index ad2877eeeccd..13c0a8149c3a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -36,6 +36,11 @@ struct panfrost_gem_object {
 */
atomic_t gpu_usecount;
 
+   /*
+* Object chunk size currently mapped onto physical memory
+*/
+   size_t heap_rss_size;
+
bool noexec :1;
bool is_heap:1;
 };
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index d54d4e7b2195..7b1490cdaa48 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -522,6 +522,7 @@ static int panfrost_mmu_map_fault_addr(struct 
panfrost_device *pfdev, int as,
   IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt);
 
bomapping->active = true;
+   bo->heap_rss_size += SZ_2;
 
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
 
-- 
2.42.0

[PATCH v4 1/6] drm/panfrost: Add cycle count GPU register definitions

2023-09-12 Thread Adrián Larumbe

These GPU registers will be used when programming the cycle counter, which
we need for providing accurate fdinfo drm-cycles values to user space.

Signed-off-by: Adrián Larumbe 
Reviewed-by: Boris Brezillon 
Reviewed-by: Steven Price 
---
 drivers/gpu/drm/panfrost/panfrost_regs.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 919f44ac853d..55ec807550b3 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -46,6 +46,8 @@
 #define   GPU_CMD_SOFT_RESET   0x01
 #define   GPU_CMD_PERFCNT_CLEAR0x03
 #define   GPU_CMD_PERFCNT_SAMPLE   0x04
+#define   GPU_CMD_CYCLE_COUNT_START0x05
+#define   GPU_CMD_CYCLE_COUNT_STOP 0x06
 #define   GPU_CMD_CLEAN_CACHES 0x07
 #define   GPU_CMD_CLEAN_INV_CACHES 0x08
 #define GPU_STATUS 0x34
@@ -73,6 +75,9 @@
 #define GPU_PRFCNT_TILER_EN0x74
 #define GPU_PRFCNT_MMU_L2_EN   0x7c
 
+#define GPU_CYCLE_COUNT_LO 0x90
+#define GPU_CYCLE_COUNT_HI 0x94
+
 #define GPU_THREAD_MAX_THREADS 0x0A0   /* (RO) Maximum number of 
threads per core */
 #define GPU_THREAD_MAX_WORKGROUP_SIZE  0x0A4   /* (RO) Maximum workgroup size 
*/
 #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8   /* (RO) Maximum threads waiting 
at a barrier */
-- 
2.42.0

Re: [PATCH v3 8/8] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

2023-09-09 Thread Adrián Larumbe

On 06.09.2023 10:11, Boris Brezillon wrote:
>On Tue,  5 Sep 2023 19:45:24 +0100
>Adrián Larumbe  wrote:
>
>> The current implementation will try to pick the highest available size
>> display unit as soon as the BO size exceeds that of the previous
>> multiplier.
>> 
>> By selecting a higher threshold, we could show more accurate size numbers.
>> 
>> Signed-off-by: Adrián Larumbe 
>> ---
>>  drivers/gpu/drm/drm_file.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index 762965e3d503..0b5fbd493e05 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -879,7 +879,7 @@ static void print_size(struct drm_printer *p, const char 
>> *stat,
>>  unsigned u;
>>  
>>  for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
>> -if (sz < SZ_1K)
>> +if (sz < (SZ_1K * 1))
>>  break;
>
>This threshold looks a bit random. How about picking a unit that allows
>us to print the size with no precision loss?
>
>   for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
>   if (sz & (SZ_1K - 1))
>   break;
>   }

In this case I picked up on Rob Clark's suggestion of choosing a hard limit of
perhaps 10k or 100k times the current unit before moving on to the next one.
While this approach guarantees that we don't lose precision, it would render a
tad too long a number in KiB for BO's that aren't a multiple of a MiB.

>>  sz = div_u64(sz, SZ_1K);
>>  }

1 2 >

1 - 100 of 170 matches

Mail list logo