Re: [PATCH v4 1/4] drm/panthor: introduce job cycle and timestamp accounting
Hi Steven, thanks for the remarks. On 19.07.2024 15:14, Steven Price wrote: > On 16/07/2024 21:11, Adrián Larumbe wrote: > > Enable calculations of job submission times in clock cycles and wall > > time. This is done by expanding the boilerplate command stream when running > > a job to include instructions that compute said times right before an after > > a user CS. > > > > Those numbers are stored in the queue's group's sync objects BO, right > > after them. Because the queues in a group might have a different number of > > slots, one must keep track of the overall slot tally when reckoning the > > offset of a queue's time sample structs, one for each slot. > > > > This commit is done in preparation for enabling DRM fdinfo support in the > > Panthor driver, which depends on the numbers calculated herein. > > > > A profile mode device flag has been added that will in a future commit > > allow UM to toggle time sampling behaviour, which is disabled by default to > > save power. It also enables marking jobs as being profiled and picks one of > > two call instruction arrays to insert into the ring buffer. One of them > > includes FW logic to sample the timestamp and cycle counter registers and > > write them into the job's syncobj, and the other does not. > > > > A profiled job's call sequence takes up two ring buffer slots, and this is > > reflected when initialising the DRM scheduler for each queue, with a > > profiled job contributing twice as many credits. > > > > Signed-off-by: Adrián Larumbe > > Thanks for the updates, this looks better. A few minor comments below. > > > --- > > drivers/gpu/drm/panthor/panthor_device.h | 2 + > > drivers/gpu/drm/panthor/panthor_sched.c | 244 --- > > 2 files changed, 216 insertions(+), 30 deletions(-) > > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h > > b/drivers/gpu/drm/panthor/panthor_device.h > > index e388c0472ba7..3ede2f80df73 100644 > > --- a/drivers/gpu/drm/panthor/panthor_device.h > > +++ b/drivers/gpu/drm/panthor/panthor_device.h > > @@ -162,6 +162,8 @@ struct panthor_device { > > */ > > struct page *dummy_latest_flush; > > } pm; > > + > > + bool profile_mode; > > }; > > > > /** > > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c > > b/drivers/gpu/drm/panthor/panthor_sched.c > > index 79ffcbc41d78..6438e5ea1f2b 100644 > > --- a/drivers/gpu/drm/panthor/panthor_sched.c > > +++ b/drivers/gpu/drm/panthor/panthor_sched.c > > @@ -93,6 +93,9 @@ > > #define MIN_CSGS 3 > > #define MAX_CSG_PRIO 0xf > > > > +#define NUM_INSTRS_PER_SLOT16 > > +#define SLOTSIZE (NUM_INSTRS_PER_SLOT * > > sizeof(u64)) > > + > > struct panthor_group; > > > > /** > > @@ -466,6 +469,9 @@ struct panthor_queue { > > */ > > struct list_head in_flight_jobs; > > } fence_ctx; > > + > > + /** @time_offset: Offset of panthor_job_times structs in group's > > syncobj bo. */ > > + unsigned long time_offset; > > AFAICT this doesn't need to be stored. We could just pass this value > into group_create_queue() as an extra parameter where it's used. I think we need to keep this offset value around, because queues within the same group could have a variable number of slots, so when fetching the sampled values from the syncobjs BO in update_fdinfo_stats, it would have to traverse the entire array of preceding queues and figure out their size in slots so as to jump over as many struct panthor_job_times after the preceding syncobj array. > > }; > > > > /** > > @@ -592,7 +598,17 @@ struct panthor_group { > > * One sync object per queue. The position of the sync object is > > * determined by the queue index. > > */ > > - struct panthor_kernel_bo *syncobjs; > > + > > + struct { > > + /** @bo: Kernel BO holding the sync objects. */ > > + struct panthor_kernel_bo *bo; > > + > > + /** > > +* @job_times_offset: Beginning of panthor_job_times struct > > samples after > > +* the group's array of sync objects. > > +*/ > > + size_t job_times_offset; > > + } syncobjs; > > > > /** @state: Group state. */ > > enum panthor_group_state state; > > @@ -651,6 +667,18 @@ struct panthor_group { > > struct li
[PATCH v4 3/4] drm/panthor: enable fdinfo for memory stats
Implement drm object's status callback. Also, we consider a PRIME imported BO to be resident if its matching dma_buf has an open attachment, which means its backing storage had already been allocated. Signed-off-by: Adrián Larumbe Reviewed-by: Liviu Dudau --- drivers/gpu/drm/panthor/panthor_gem.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c index 38f560864879..c60b599665d8 100644 --- a/drivers/gpu/drm/panthor/panthor_gem.c +++ b/drivers/gpu/drm/panthor/panthor_gem.c @@ -145,6 +145,17 @@ panthor_gem_prime_export(struct drm_gem_object *obj, int flags) return drm_gem_prime_export(obj, flags); } +static enum drm_gem_object_status panthor_gem_status(struct drm_gem_object *obj) +{ + struct panthor_gem_object *bo = to_panthor_bo(obj); + enum drm_gem_object_status res = 0; + + if (bo->base.base.import_attach || bo->base.pages) + res |= DRM_GEM_OBJECT_RESIDENT; + + return res; +} + static const struct drm_gem_object_funcs panthor_gem_funcs = { .free = panthor_gem_free_object, .print_info = drm_gem_shmem_object_print_info, @@ -154,6 +165,7 @@ static const struct drm_gem_object_funcs panthor_gem_funcs = { .vmap = drm_gem_shmem_object_vmap, .vunmap = drm_gem_shmem_object_vunmap, .mmap = panthor_gem_mmap, + .status = panthor_gem_status, .export = panthor_gem_prime_export, .vm_ops = _gem_shmem_vm_ops, }; -- 2.45.1
[PATCH v4 2/4] drm/panthor: add DRM fdinfo support
Drawing from the FW-calculated values in the previous commit, we can increase the numbers for an open file by collecting them from finished jobs when updating their group synchronisation objects. Display of fdinfo key-value pairs is governed by a flag that is by default disabled in the present commit, and supporting manual toggle of it will be the matter of a later commit. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_devfreq.c | 18 - drivers/gpu/drm/panthor/panthor_device.h | 10 + drivers/gpu/drm/panthor/panthor_drv.c | 33 drivers/gpu/drm/panthor/panthor_sched.c | 47 +++ 4 files changed, 107 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c b/drivers/gpu/drm/panthor/panthor_devfreq.c index c6d3c327cc24..9d0f891b9b53 100644 --- a/drivers/gpu/drm/panthor/panthor_devfreq.c +++ b/drivers/gpu/drm/panthor/panthor_devfreq.c @@ -62,14 +62,20 @@ static void panthor_devfreq_update_utilization(struct panthor_devfreq *pdevfreq) static int panthor_devfreq_target(struct device *dev, unsigned long *freq, u32 flags) { + struct panthor_device *ptdev = dev_get_drvdata(dev); struct dev_pm_opp *opp; + int err; opp = devfreq_recommended_opp(dev, freq, flags); if (IS_ERR(opp)) return PTR_ERR(opp); dev_pm_opp_put(opp); - return dev_pm_opp_set_rate(dev, *freq); + err = dev_pm_opp_set_rate(dev, *freq); + if (!err) + ptdev->current_frequency = *freq; + + return err; } static void panthor_devfreq_reset(struct panthor_devfreq *pdevfreq) @@ -130,6 +136,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) struct panthor_devfreq *pdevfreq; struct dev_pm_opp *opp; unsigned long cur_freq; + unsigned long freq = ULONG_MAX; int ret; pdevfreq = drmm_kzalloc(>base, sizeof(*ptdev->devfreq), GFP_KERNEL); @@ -161,6 +168,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) return PTR_ERR(opp); panthor_devfreq_profile.initial_freq = cur_freq; + ptdev->current_frequency = cur_freq; /* Regulator coupling only takes care of synchronizing/balancing voltage * updates, but the coupled regulator needs to be enabled manually. @@ -204,6 +212,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev) dev_pm_opp_put(opp); + /* Find the fastest defined rate */ + opp = dev_pm_opp_find_freq_floor(dev, ); + if (IS_ERR(opp)) + return PTR_ERR(opp); + ptdev->fast_rate = freq; + + dev_pm_opp_put(opp); + /* * Setup default thresholds for the simple_ondemand governor. * The values are chosen based on experiments. diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index 3ede2f80df73..4536fbf43a4e 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -163,9 +163,16 @@ struct panthor_device { struct page *dummy_latest_flush; } pm; + unsigned long current_frequency; + unsigned long fast_rate; bool profile_mode; }; +struct panthor_gpu_usage { + u64 time; + u64 cycles; +}; + /** * struct panthor_file - Panthor file */ @@ -178,6 +185,9 @@ struct panthor_file { /** @groups: Scheduling group pool attached to this file. */ struct panthor_group_pool *groups; + + /** @stats: cycle and timestamp measures for job execution. */ + struct panthor_gpu_usage stats; }; int panthor_device_init(struct panthor_device *ptdev); diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index b8a84f26b3ef..6a0c1a06a709 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -3,12 +3,17 @@ /* Copyright 2019 Linaro, Ltd., Rob Herring */ /* Copyright 2019 Collabora ltd. */ +#ifdef CONFIG_ARM_ARCH_TIMER +#include +#endif + #include #include #include #include #include #include +#include #include #include @@ -1351,6 +1356,32 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) return ret; } +static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev, + struct panthor_file *pfile, + struct drm_printer *p) +{ + if (ptdev->profile_mode) { +#ifdef CONFIG_ARM_ARCH_TIMER + drm_printf(p, "drm-engine-panthor:\t%llu ns\n", + DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC), + arch_timer_get_cntfrq())); +#endif + drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles); + } + drm_print
[PATCH v4 4/4] drm/panthor: add sysfs knob for enabling job profiling
This commit introduces a DRM device sysfs attribute that lets UM control the job accounting status in the device. The knob variable had been brought in as part of a previous commit, but now we're able to fix it manually. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_drv.c | 36 +++ 1 file changed, 36 insertions(+) diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index 6a0c1a06a709..a2876310856f 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -1448,6 +1448,41 @@ static void panthor_remove(struct platform_device *pdev) panthor_device_unplug(ptdev); } +static ssize_t profiling_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct panthor_device *ptdev = dev_get_drvdata(dev); + + return sysfs_emit(buf, "%d\n", ptdev->profile_mode); +} + +static ssize_t profiling_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct panthor_device *ptdev = dev_get_drvdata(dev); + bool value; + int err; + + err = kstrtobool(buf, ); + if (err) + return err; + + ptdev->profile_mode = value; + + return len; +} + +static DEVICE_ATTR_RW(profiling); + +static struct attribute *panthor_attrs[] = { + _attr_profiling.attr, + NULL, +}; + +ATTRIBUTE_GROUPS(panthor); + static const struct of_device_id dt_match[] = { { .compatible = "rockchip,rk3588-mali" }, { .compatible = "arm,mali-valhall-csf" }, @@ -1467,6 +1502,7 @@ static struct platform_driver panthor_driver = { .name = "panthor", .pm = pm_ptr(_pm_ops), .of_match_table = dt_match, + .dev_groups = panthor_groups, }, }; -- 2.45.1
[PATCH v4 1/4] drm/panthor: introduce job cycle and timestamp accounting
Enable calculations of job submission times in clock cycles and wall time. This is done by expanding the boilerplate command stream when running a job to include instructions that compute said times right before an after a user CS. Those numbers are stored in the queue's group's sync objects BO, right after them. Because the queues in a group might have a different number of slots, one must keep track of the overall slot tally when reckoning the offset of a queue's time sample structs, one for each slot. This commit is done in preparation for enabling DRM fdinfo support in the Panthor driver, which depends on the numbers calculated herein. A profile mode device flag has been added that will in a future commit allow UM to toggle time sampling behaviour, which is disabled by default to save power. It also enables marking jobs as being profiled and picks one of two call instruction arrays to insert into the ring buffer. One of them includes FW logic to sample the timestamp and cycle counter registers and write them into the job's syncobj, and the other does not. A profiled job's call sequence takes up two ring buffer slots, and this is reflected when initialising the DRM scheduler for each queue, with a profiled job contributing twice as many credits. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_device.h | 2 + drivers/gpu/drm/panthor/panthor_sched.c | 244 --- 2 files changed, 216 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index e388c0472ba7..3ede2f80df73 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -162,6 +162,8 @@ struct panthor_device { */ struct page *dummy_latest_flush; } pm; + + bool profile_mode; }; /** diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index 79ffcbc41d78..6438e5ea1f2b 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -93,6 +93,9 @@ #define MIN_CSGS 3 #define MAX_CSG_PRIO 0xf +#define NUM_INSTRS_PER_SLOT16 +#define SLOTSIZE (NUM_INSTRS_PER_SLOT * sizeof(u64)) + struct panthor_group; /** @@ -466,6 +469,9 @@ struct panthor_queue { */ struct list_head in_flight_jobs; } fence_ctx; + + /** @time_offset: Offset of panthor_job_times structs in group's syncobj bo. */ + unsigned long time_offset; }; /** @@ -592,7 +598,17 @@ struct panthor_group { * One sync object per queue. The position of the sync object is * determined by the queue index. */ - struct panthor_kernel_bo *syncobjs; + + struct { + /** @bo: Kernel BO holding the sync objects. */ + struct panthor_kernel_bo *bo; + + /** +* @job_times_offset: Beginning of panthor_job_times struct samples after +* the group's array of sync objects. +*/ + size_t job_times_offset; + } syncobjs; /** @state: Group state. */ enum panthor_group_state state; @@ -651,6 +667,18 @@ struct panthor_group { struct list_head wait_node; }; +struct panthor_job_times { + struct { + u64 before; + u64 after; + } cycles; + + struct { + u64 before; + u64 after; + } time; +}; + /** * group_queue_work() - Queue a group work * @group: Group to queue the work for. @@ -730,6 +758,9 @@ struct panthor_job { /** @queue_idx: Index of the queue inside @group. */ u32 queue_idx; + /** @ringbuf_idx: Index of the ringbuffer inside @queue. */ + u32 ringbuf_idx; + /** @call_info: Information about the userspace command stream call. */ struct { /** @start: GPU address of the userspace command stream. */ @@ -764,6 +795,9 @@ struct panthor_job { /** @done_fence: Fence signaled when the job is finished or cancelled. */ struct dma_fence *done_fence; + + /** @is_profiled: Whether timestamp and cycle numbers were gathered for this job */ + bool is_profiled; }; static void @@ -844,7 +878,7 @@ static void group_release_work(struct work_struct *work) panthor_kernel_bo_destroy(group->suspend_buf); panthor_kernel_bo_destroy(group->protm_suspend_buf); - panthor_kernel_bo_destroy(group->syncobjs); + panthor_kernel_bo_destroy(group->syncobjs.bo); panthor_vm_put(group->vm); kfree(group); @@ -1969,8 +2003,6 @@ tick_ctx_init(struct panthor_scheduler *sched, } } -#define NUM_INSTRS_PER_SLOT16 - static void group_term_post_processing(struct panth
[PATCH v4 0/4] Support fdinfo runtime and memory stats on Panthor
This patch series enables userspace utilities like gputop and nvtop to query a render context's fdinfo file and figure out rates of engine and memory utilisation. Previous discussion can be found at https://lore.kernel.org/dri-devel/dqhnxhgho6spfh7xhw6yvs2iiqeqzeg63e6jqqpw2g7gkrfphn@dojsixyl4esv/ Changelog: v4: - Fixed wrong assignment location for frequency values in Panthor's devfreq - Removed the last two commits about registering size of internal BO's - Rearranged patch series so that sysfs knob is done last and all the previous time sampling and fdinfo show dependencies are already in place v3: - Fixed some nits and removed useless bounds check in panthor_sched.c - Added support for sysfs profiling knob and optional job accounting - Added new patches for calculating size of internal BO's v2: - Split original first patch in two, one for FW CS cycle and timestamp calculations and job accounting memory management, and a second one that enables fdinfo. - Moved NUM_INSTRS_PER_SLOT to the file prelude - Removed nelem variable from the group's struct definition. - Precompute size of group's syncobj BO to avoid code duplication. - Some minor nits. Adrián Larumbe (4): drm/panthor: introduce job cycle and timestamp accounting drm/panthor: add DRM fdinfo support drm/panthor: enable fdinfo for memory stats drm/panthor: add sysfs knob for enabling job profiling drivers/gpu/drm/panthor/panthor_devfreq.c | 18 +- drivers/gpu/drm/panthor/panthor_device.h | 12 + drivers/gpu/drm/panthor/panthor_drv.c | 69 + drivers/gpu/drm/panthor/panthor_gem.c | 12 + drivers/gpu/drm/panthor/panthor_sched.c | 291 +++--- 5 files changed, 371 insertions(+), 31 deletions(-) -- 2.45.1
Re: [PATCH v3 0/7] Support fdinfo runtime and memory stats on Panthor
Hi Steven, thanks for the review. On 13.06.2024 16:28, Steven Price wrote: > On 06/06/2024 01:49, Adrián Larumbe wrote: > > This patch series enables userspace utilities like gputop and nvtop to > > query a render context's fdinfo file and figure out rates of engine > > and memory utilisation. > > > > Previous discussion can be found at > > https://lore.kernel.org/dri-devel/20240423213240.91412-1-adrian.laru...@collabora.com/ > > > > Changelog: > > v3: > > - Fixed some nits and removed useless bounds check in panthor_sched.c > > - Added support for sysfs profiling knob and optional job accounting > > - Added new patches for calculating size of internal BO's > > v2: > > - Split original first patch in two, one for FW CS cycle and timestamp > > calculations and job accounting memory management, and a second one > > that enables fdinfo. > > - Moved NUM_INSTRS_PER_SLOT to the file prelude > > - Removed nelem variable from the group's struct definition. > > - Precompute size of group's syncobj BO to avoid code duplication. > > - Some minor nits. > > > > > > Adrián Larumbe (7): > > drm/panthor: introduce job cycle and timestamp accounting > > drm/panthor: add DRM fdinfo support > > drm/panthor: enable fdinfo for memory stats > > drm/panthor: add sysfs knob for enabling job profiling > > drm/panthor: support job accounting > > drm/drm_file: add display of driver's internal memory size > > drm/panthor: register size of internal objects through fdinfo > > The general shape of what you end up with looks correct, but these > patches are now in a bit of a mess. It's confusing to review when the > accounting is added unconditionally and then a sysfs knob is added which > changes it all to be conditional. Equally that last patch (register size > of internal objects through fdinfo) includes a massive amount of churn > moving everything into an 'fdinfo' struct which really should be in a > separate patch. > > Ideally this needs to be reworked into a logical series of patches with > knowledge of what's coming next. E.g. the first patch could introduce > the code for cycle/timestamp accounting but leave it disabled to be then > enabled by the sysfs knob patch. > > One thing I did notice though is that I wasn't seeing the GPU frequency > change, looking more closely at this it seems like there's something > dodgy going on with the devfreq code. From what I can make out I often > end up in a situation where all contexts are idle every time tick_work() > is called - I think this is simply because tick_work() is scheduled with > a delay and by the time the delay has hit the work is complete. Nothing > to do with this series, but something that needs looking into. I'm on > holiday for a week but I'll try to look at this when I'm back. I've found why the current frequency value wasn't updating when manually adjusting the device's devfreq governor. Fix will be part of the next patch series revision. Adrian > Steve > > > Documentation/gpu/drm-usage-stats.rst | 4 + > > drivers/gpu/drm/drm_file.c| 9 +- > > drivers/gpu/drm/msm/msm_drv.c | 2 +- > > drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +- > > drivers/gpu/drm/panthor/panthor_devfreq.c | 10 + > > drivers/gpu/drm/panthor/panthor_device.c | 2 + > > drivers/gpu/drm/panthor/panthor_device.h | 21 ++ > > drivers/gpu/drm/panthor/panthor_drv.c | 83 +- > > drivers/gpu/drm/panthor/panthor_fw.c | 16 +- > > drivers/gpu/drm/panthor/panthor_fw.h | 5 +- > > drivers/gpu/drm/panthor/panthor_gem.c | 67 - > > drivers/gpu/drm/panthor/panthor_gem.h | 16 +- > > drivers/gpu/drm/panthor/panthor_heap.c| 23 +- > > drivers/gpu/drm/panthor/panthor_heap.h| 6 +- > > drivers/gpu/drm/panthor/panthor_mmu.c | 8 +- > > drivers/gpu/drm/panthor/panthor_mmu.h | 3 +- > > drivers/gpu/drm/panthor/panthor_sched.c | 304 +++--- > > include/drm/drm_file.h| 7 +- > > 18 files changed, 522 insertions(+), 66 deletions(-) > > > > > > base-commit: 310ec03841a36e3f45fb528f0dfdfe5b9e84b037
Re: [PATCH v3 0/7] Support fdinfo runtime and memory stats on Panthor
Hi Steven, On 13.06.2024 16:28, Steven Price wrote: > On 06/06/2024 01:49, Adrián Larumbe wrote: > > This patch series enables userspace utilities like gputop and nvtop to > > query a render context's fdinfo file and figure out rates of engine > > and memory utilisation. > > > > Previous discussion can be found at > > https://lore.kernel.org/dri-devel/20240423213240.91412-1-adrian.laru...@collabora.com/ > > > > Changelog: > > v3: > > - Fixed some nits and removed useless bounds check in panthor_sched.c > > - Added support for sysfs profiling knob and optional job accounting > > - Added new patches for calculating size of internal BO's > > v2: > > - Split original first patch in two, one for FW CS cycle and timestamp > > calculations and job accounting memory management, and a second one > > that enables fdinfo. > > - Moved NUM_INSTRS_PER_SLOT to the file prelude > > - Removed nelem variable from the group's struct definition. > > - Precompute size of group's syncobj BO to avoid code duplication. > > - Some minor nits. > > > > > > Adrián Larumbe (7): > > drm/panthor: introduce job cycle and timestamp accounting > > drm/panthor: add DRM fdinfo support > > drm/panthor: enable fdinfo for memory stats > > drm/panthor: add sysfs knob for enabling job profiling > > drm/panthor: support job accounting > > drm/drm_file: add display of driver's internal memory size > > drm/panthor: register size of internal objects through fdinfo > > The general shape of what you end up with looks correct, but these > patches are now in a bit of a mess. It's confusing to review when the > accounting is added unconditionally and then a sysfs knob is added which > changes it all to be conditional. Equally that last patch (register size > of internal objects through fdinfo) includes a massive amount of churn > moving everything into an 'fdinfo' struct which really should be in a > separate patch. I do agree with you in that perhaps too many things change across successive patches in the series. I think I can explain this because of the way the series has evolved thorugh successive revisions. In the last one of them, only the first three patches were present, and both Liviu and Boris seemed happy with the shape they had taken, but then Boris suggested adding the sysfs knob and optional profiling support rather than submitting them as part of a different series like I had done in Panfrost. In that spirit, I decided to keep the first three patches intact. The last two patches are a bit more of an afterthought, and because they touch on the drm fdinfo core, I understood they were more likely to be rejected for now, at least until consensus with Tvrtko and other people involved in the development of fdinfo had agreed on a way to report internal bo sizes. However, being also part of fdinfo, I thought this series was a good place to spark a debate about them, even if they don't seem as seamlessly linked with the rest of the work. > Ideally this needs to be reworked into a logical series of patches with > knowledge of what's coming next. E.g. the first patch could introduce > the code for cycle/timestamp accounting but leave it disabled to be then > enabled by the sysfs knob patch. > > One thing I did notice though is that I wasn't seeing the GPU frequency > change, looking more closely at this it seems like there's something > dodgy going on with the devfreq code. From what I can make out I often > end up in a situation where all contexts are idle every time tick_work() > is called - I think this is simply because tick_work() is scheduled with > a delay and by the time the delay has hit the work is complete. Nothing > to do with this series, but something that needs looking into. I'm on > holiday for a week but I'll try to look at this when I'm back. Would you mind sharing what you do in UM to trigger this behaviour and also maybe the debug traces you've written into the driver to confirm this? > Steve > > > Documentation/gpu/drm-usage-stats.rst | 4 + > > drivers/gpu/drm/drm_file.c| 9 +- > > drivers/gpu/drm/msm/msm_drv.c | 2 +- > > drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +- > > drivers/gpu/drm/panthor/panthor_devfreq.c | 10 + > > drivers/gpu/drm/panthor/panthor_device.c | 2 + > > drivers/gpu/drm/panthor/panthor_device.h | 21 ++ > > drivers/gpu/drm/panthor/panthor_drv.c | 83 +- > > drivers/gpu/drm/panthor/panthor_fw.c | 16 +- > > drivers/gpu/drm/panthor/panthor_fw.h | 5 +- > > drivers/gpu/drm/panthor/panthor_gem.c | 67 - > > drivers/gpu/drm/panthor/panthor_gem.h | 16 +- >
[PATCH v3 7/7] drm/panthor: register size of internal objects through fdinfo
This includes both DRM objects created to support queues, groups and heaps, and also objects whose pages are shared between the GPU and the MCU. However, this doesn't include objects that hold the firmware's binary regions, since these aren't owned by a render context and are allocated only once at driver's initialisation time. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_device.c | 2 + drivers/gpu/drm/panthor/panthor_device.h | 13 +- drivers/gpu/drm/panthor/panthor_drv.c| 20 ++--- drivers/gpu/drm/panthor/panthor_fw.c | 16 +-- drivers/gpu/drm/panthor/panthor_fw.h | 5 ++- drivers/gpu/drm/panthor/panthor_gem.c| 55 ++-- drivers/gpu/drm/panthor/panthor_gem.h| 16 +-- drivers/gpu/drm/panthor/panthor_heap.c | 23 +++--- drivers/gpu/drm/panthor/panthor_heap.h | 6 ++- drivers/gpu/drm/panthor/panthor_mmu.c| 8 +++- drivers/gpu/drm/panthor/panthor_mmu.h| 3 +- drivers/gpu/drm/panthor/panthor_sched.c | 19 12 files changed, 147 insertions(+), 39 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c index 4082c8f2951d..868fa9aba570 100644 --- a/drivers/gpu/drm/panthor/panthor_device.c +++ b/drivers/gpu/drm/panthor/panthor_device.c @@ -179,6 +179,8 @@ int panthor_device_init(struct panthor_device *ptdev) if (ret) return ret; + drmm_mutex_init(>base, >private_obj_list_lock); + /* * Set the dummy page holding the latest flush to 1. This will cause the * flush to avoided as we know it isn't necessary if the submission diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index c3ec1e31f8b7..d3abf9700887 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -166,6 +166,9 @@ struct panthor_device { bool profile_mode; unsigned long current_frequency; unsigned long fast_rate; + + /** @private_obj_list_lock: Lock around per-file lists of internal GEM objects */ + struct mutex private_obj_list_lock; }; struct panthor_gpu_usage { @@ -186,8 +189,14 @@ struct panthor_file { /** @groups: Scheduling group pool attached to this file. */ struct panthor_group_pool *groups; - /** @stats: cycle and timestamp measures for job execution. */ - struct panthor_gpu_usage stats; + /** @fdinfo: Open file tracking information */ + struct { + /** @stats: cycle and timestamp measures for job execution. */ + struct panthor_gpu_usage stats; + + /** @private_file_list: File's list of private GEM objects. */ + struct list_head private_file_list; + } fdinfo; }; int panthor_device_init(struct panthor_device *ptdev); diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index a2876310856f..20a1add84014 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -1048,13 +1048,13 @@ static int panthor_ioctl_tiler_heap_create(struct drm_device *ddev, void *data, if (!vm) return -EINVAL; - pool = panthor_vm_get_heap_pool(vm, true); + pool = panthor_vm_get_heap_pool(vm, true, pfile); if (IS_ERR(pool)) { ret = PTR_ERR(pool); goto out_put_vm; } - ret = panthor_heap_create(pool, + ret = panthor_heap_create(pool, pfile, args->initial_chunk_count, args->chunk_size, args->max_chunks, @@ -1094,7 +1094,7 @@ static int panthor_ioctl_tiler_heap_destroy(struct drm_device *ddev, void *data, if (!vm) return -EINVAL; - pool = panthor_vm_get_heap_pool(vm, false); + pool = panthor_vm_get_heap_pool(vm, false, NULL); if (IS_ERR(pool)) { ret = PTR_ERR(pool); goto out_put_vm; @@ -1268,6 +1268,8 @@ panthor_open(struct drm_device *ddev, struct drm_file *file) pfile->ptdev = ptdev; + INIT_LIST_HEAD(>fdinfo.private_file_list); + ret = panthor_vm_pool_create(pfile); if (ret) goto err_free_file; @@ -1295,6 +1297,12 @@ panthor_postclose(struct drm_device *ddev, struct drm_file *file) { struct panthor_file *pfile = file->driver_priv; + /* +* Group's internal BO's are destroyed asynchronously in a separate worker thread, +* so there's a chance by the time BO release happens, the file is already gone. +*/ + panthor_gem_dettach_internal_bos(pfile); + panthor_group_pool_destroy(pfile); panthor_vm_pool_destroy(pfile); @@ -1363,10 +1371,10 @@ static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev, if (ptdev-
[PATCH v3 6/7] drm/drm_file: add display of driver's internal memory size
Some drivers must allocate a considerable amount of memory for bookkeeping structures and GPU's MCU-kernel shared communication regions. These are often created as a result of the invocation of the driver's ioctl() interface functions, so it is sensible to consider them as being owned by the render context associated with an open drm file. However, at the moment drm_show_memory_stats only traverses the UM-exposed drm objects for which a handle exists. Private driver objects and memory regions, though connected to a render context, are unaccounted for in their fdinfo numbers. Add a new drm_memory_stats 'internal' memory category. Because deciding what constitutes an 'internal' object and where to find these are driver-dependent, calculation of this size must be done through a driver-provided function pointer, which becomes the third argument of drm_show_memory_stats. Drivers which have no interest in exposing the size of internal memory objects can keep passing NULL for unaltered behaviour. Signed-off-by: Adrián Larumbe --- Documentation/gpu/drm-usage-stats.rst | 4 drivers/gpu/drm/drm_file.c | 9 +++-- drivers/gpu/drm/msm/msm_drv.c | 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +- include/drm/drm_file.h | 7 ++- 5 files changed, 19 insertions(+), 5 deletions(-) diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index 6dc299343b48..0da5ebecd232 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -157,6 +157,10 @@ The total size of buffers that are purgeable. The total size of buffers that are active on one or more engines. +- drm-internal-: [KiB|MiB] + +The total size of GEM objects that aren't exposed to user space. + Implementation Details == diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 638ffaf5..d1c13eed8d34 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -874,9 +874,10 @@ void drm_print_memory_stats(struct drm_printer *p, enum drm_gem_object_status supported_status, const char *region) { - print_size(p, "total", region, stats->private + stats->shared); + print_size(p, "total", region, stats->private + stats->shared + stats->internal); print_size(p, "shared", region, stats->shared); print_size(p, "active", region, stats->active); + print_size(p, "internal", region, stats->internal); if (supported_status & DRM_GEM_OBJECT_RESIDENT) print_size(p, "resident", region, stats->resident); @@ -890,11 +891,12 @@ EXPORT_SYMBOL(drm_print_memory_stats); * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats * @p: the printer to print output to * @file: the DRM file + * @func: driver-specific function pointer to count the size of internal objects * * Helper to iterate over GEM objects with a handle allocated in the specified * file. */ -void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) +void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file, internal_bos func) { struct drm_gem_object *obj; struct drm_memory_stats status = {}; @@ -940,6 +942,9 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } spin_unlock(>table_lock); + if (func) + func(, file); + drm_print_memory_stats(p, , supported_status, "memory"); } EXPORT_SYMBOL(drm_show_memory_stats); diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index 9c33f4e3f822..f97d3cdc4f50 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -880,7 +880,7 @@ static void msm_show_fdinfo(struct drm_printer *p, struct drm_file *file) msm_gpu_show_fdinfo(priv->gpu, file->driver_priv, p); - drm_show_memory_stats(p, file); + drm_show_memory_stats(p, file, NULL); } static const struct file_operations fops = { diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index ef9f6c0716d5..53640ac44e42 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -570,7 +570,7 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file) panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p); - drm_show_memory_stats(p, file); + drm_show_memory_stats(p, file, NULL); } static const struct file_operations panfrost_drm_driver_fops = { diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index ab230d3af138..d71a5ac50ea9 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -464,6 +464,7 @@ void drm_send_event_timestamp_locked
[PATCH v3 5/7] drm/panthor: support job accounting
A previous commit brought in a sysfs knob to control the driver's profiling status. This changeset flags jobs as being profiled according to the driver's global profiling status, and picks one of two call instruction arrays to insert into the ring buffer. One of them includes FW logic to sample the timestamp and cycle counter registers and write them into the job's syncobj, and the other does not. A profiled job's call sequence takes up two ring buffer slots, and this is reflected when initialising the DRM scheduler for each queue, with a profiled job contributing twice as many credits. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_sched.c | 95 ++--- 1 file changed, 86 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index bbd20db40e7b..4fb6fc5c2314 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -93,7 +93,7 @@ #define MIN_CSGS 3 #define MAX_CSG_PRIO 0xf -#define NUM_INSTRS_PER_SLOT32 +#define NUM_INSTRS_PER_SLOT16 #define SLOTSIZE (NUM_INSTRS_PER_SLOT * sizeof(u64)) struct panthor_group; @@ -807,6 +807,9 @@ struct panthor_job { /** @done_fence: Fence signaled when the job is finished or cancelled. */ struct dma_fence *done_fence; + + /** @is_profiled: Whether timestamp and cycle numbers were gathered for this job */ + bool is_profiled; }; static void @@ -2865,7 +2868,8 @@ static void group_sync_upd_work(struct work_struct *work) dma_fence_end_signalling(cookie); list_for_each_entry_safe(job, job_tmp, _jobs, node) { - update_fdinfo_stats(job); + if (job->is_profiled) + update_fdinfo_stats(job); list_del_init(>node); panthor_job_put(>base); } @@ -2884,6 +2888,8 @@ queue_run_job(struct drm_sched_job *sched_job) u32 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf); u32 ringbuf_insert = queue->iface.input->insert & (ringbuf_size - 1); u32 ringbuf_index = ringbuf_insert / (SLOTSIZE); + bool ringbuf_wraparound = + job->is_profiled && ((ringbuf_size/SLOTSIZE) == ringbuf_index + 1); u64 addr_reg = ptdev->csif_info.cs_reg_count - ptdev->csif_info.unpreserved_cs_reg_count; u64 val_reg = addr_reg + 2; @@ -2893,12 +2899,51 @@ queue_run_job(struct drm_sched_job *sched_job) job->queue_idx * sizeof(struct panthor_syncobj_64b); u64 times_addr = panthor_kernel_bo_gpuva(group->syncobjs.bo) + queue->time_offset + (ringbuf_index * sizeof(struct panthor_job_times)); + size_t call_insrt_size; + u64 *call_instrs; u32 waitall_mask = GENMASK(sched->sb_slot_count - 1, 0); struct dma_fence *done_fence; int ret; - u64 call_instrs[NUM_INSTRS_PER_SLOT] = { + u64 call_instrs_simple[NUM_INSTRS_PER_SLOT] = { + /* MOV32 rX+2, cs.latest_flush */ + (2ull << 56) | (val_reg << 48) | job->call_info.latest_flush, + + /* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */ + (36ull << 56) | (0ull << 48) | (val_reg << 40) | (0 << 16) | 0x233, + + /* MOV48 rX:rX+1, cs.start */ + (1ull << 56) | (addr_reg << 48) | job->call_info.start, + + /* MOV32 rX+2, cs.size */ + (2ull << 56) | (val_reg << 48) | job->call_info.size, + + /* WAIT(0) => waits for FLUSH_CACHE2 instruction */ + (3ull << 56) | (1 << 16), + + /* CALL rX:rX+1, rX+2 */ + (32ull << 56) | (addr_reg << 40) | (val_reg << 32), + + /* MOV48 rX:rX+1, sync_addr */ + (1ull << 56) | (addr_reg << 48) | sync_addr, + + /* MOV48 rX+2, #1 */ + (1ull << 56) | (val_reg << 48) | 1, + + /* WAIT(all) */ + (3ull << 56) | (waitall_mask << 16), + + /* SYNC_ADD64.system_scope.propage_err.nowait rX:rX+1, rX+2*/ + (51ull << 56) | (0ull << 48) | (addr_reg << 40) | (val_reg << 32) | (0 << 16) | 1, + + /* ERROR_BARRIER, so we can recover from faults at job +* boundaries. +*/ + (47ull << 56), + }; + + u64 call_instrs_profile[NUM_INSTRS_PER_SLOT*2] = { /* MOV32 rX+2, cs.latest_flush */ (2ull << 56) | (val_reg << 48) | job->call_info.latest_flush, @@ -2960,9 +3005,18 @@
[PATCH v3 2/7] drm/panthor: add DRM fdinfo support
Drawing from the FW-calculated values in the previous commit, we can increase the numbers for an open file by collecting them from finished jobs when updating their group synchronisation objects. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_devfreq.c | 10 + drivers/gpu/drm/panthor/panthor_device.h | 11 ++ drivers/gpu/drm/panthor/panthor_drv.c | 31 +++ drivers/gpu/drm/panthor/panthor_sched.c | 46 +++ 4 files changed, 98 insertions(+) diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c b/drivers/gpu/drm/panthor/panthor_devfreq.c index c6d3c327cc24..5eededaeade7 100644 --- a/drivers/gpu/drm/panthor/panthor_devfreq.c +++ b/drivers/gpu/drm/panthor/panthor_devfreq.c @@ -91,6 +91,7 @@ static int panthor_devfreq_get_dev_status(struct device *dev, spin_lock_irqsave(>lock, irqflags); panthor_devfreq_update_utilization(pdevfreq); + ptdev->current_frequency = status->current_frequency; status->total_time = ktime_to_ns(ktime_add(pdevfreq->busy_time, pdevfreq->idle_time)); @@ -130,6 +131,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) struct panthor_devfreq *pdevfreq; struct dev_pm_opp *opp; unsigned long cur_freq; + unsigned long freq = ULONG_MAX; int ret; pdevfreq = drmm_kzalloc(>base, sizeof(*ptdev->devfreq), GFP_KERNEL); @@ -204,6 +206,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev) dev_pm_opp_put(opp); + /* Find the fastest defined rate */ + opp = dev_pm_opp_find_freq_floor(dev, ); + if (IS_ERR(opp)) + return PTR_ERR(opp); + ptdev->fast_rate = freq; + + dev_pm_opp_put(opp); + /* * Setup default thresholds for the simple_ondemand governor. * The values are chosen based on experiments. diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index e388c0472ba7..8a0260a7b90a 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -162,6 +162,14 @@ struct panthor_device { */ struct page *dummy_latest_flush; } pm; + + unsigned long current_frequency; + unsigned long fast_rate; +}; + +struct panthor_gpu_usage { + u64 time; + u64 cycles; }; /** @@ -176,6 +184,9 @@ struct panthor_file { /** @groups: Scheduling group pool attached to this file. */ struct panthor_group_pool *groups; + + /** @stats: cycle and timestamp measures for job execution. */ + struct panthor_gpu_usage stats; }; int panthor_device_init(struct panthor_device *ptdev); diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index b8a84f26b3ef..6d25385e02a1 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -3,12 +3,17 @@ /* Copyright 2019 Linaro, Ltd., Rob Herring */ /* Copyright 2019 Collabora ltd. */ +#ifdef CONFIG_ARM_ARCH_TIMER +#include +#endif + #include #include #include #include #include #include +#include #include #include @@ -1351,6 +1356,30 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) return ret; } +static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev, + struct panthor_file *pfile, + struct drm_printer *p) +{ +#ifdef CONFIG_ARM_ARCH_TIMER + drm_printf(p, "drm-engine-panthor:\t%llu ns\n", + DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC), + arch_timer_get_cntfrq())); +#endif + drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles); + drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate); + drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", ptdev->current_frequency); +} + +static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file) +{ + struct drm_device *dev = file->minor->dev; + struct panthor_device *ptdev = container_of(dev, struct panthor_device, base); + + panthor_gpu_show_fdinfo(ptdev, file->driver_priv, p); + + drm_show_memory_stats(p, file); +} + static const struct file_operations panthor_drm_driver_fops = { .open = drm_open, .release = drm_release, @@ -1360,6 +1389,7 @@ static const struct file_operations panthor_drm_driver_fops = { .read = drm_read, .llseek = noop_llseek, .mmap = panthor_mmap, + .show_fdinfo = drm_show_fdinfo, }; #ifdef CONFIG_DEBUG_FS @@ -1378,6 +1408,7 @@ static const struct drm_driver panthor_drm_driver = { DRIVER_SYNCOBJ_TIMELINE | DRIVER_GEM_GPUVA, .open = panthor_
[PATCH v3 1/7] drm/panthor: introduce job cycle and timestamp accounting
Enable calculations of job submission times in clock cycles and wall time. This is done by expanding the boilerplate command stream when running a job to include instructions that compute said times right before an after a user CS. Those numbers are stored in the queue's group's sync objects BO, right after them. Because the queues in a group might have a different number of slots, one must keep track of the overall slot tally when reckoning the offset of a queue's time sample structs, one for each slot. NUM_INSTRS_PER_SLOT had to be increased to 32 because of adding new FW instructions for storing and subtracting the cycle counter and timestamp register, and it must always remain a power of two. This commit is done in preparation for enabling DRM fdinfo support in the Panthor driver, which depends on the numbers calculated herein. Signed-off-by: Adrián Larumbe Reviewed-by: Liviu Dudau --- drivers/gpu/drm/panthor/panthor_sched.c | 156 1 file changed, 132 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index 79ffcbc41d78..62a67d6bd37a 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -93,6 +93,9 @@ #define MIN_CSGS 3 #define MAX_CSG_PRIO 0xf +#define NUM_INSTRS_PER_SLOT32 +#define SLOTSIZE (NUM_INSTRS_PER_SLOT * sizeof(u64)) + struct panthor_group; /** @@ -466,6 +469,9 @@ struct panthor_queue { */ struct list_head in_flight_jobs; } fence_ctx; + + /** @time_offset: Offset of panthor_job_times structs in group's syncobj bo. */ + unsigned long time_offset; }; /** @@ -592,7 +598,17 @@ struct panthor_group { * One sync object per queue. The position of the sync object is * determined by the queue index. */ - struct panthor_kernel_bo *syncobjs; + + struct { + /** @bo: Kernel BO holding the sync objects. */ + struct panthor_kernel_bo *bo; + + /** +* @job_times_offset: Beginning of panthor_job_times struct samples after +* the group's array of sync objects. +*/ + size_t job_times_offset; + } syncobjs; /** @state: Group state. */ enum panthor_group_state state; @@ -651,6 +667,18 @@ struct panthor_group { struct list_head wait_node; }; +struct panthor_job_times { + struct { + u64 before; + u64 after; + } cycles; + + struct { + u64 before; + u64 after; + } time; +}; + /** * group_queue_work() - Queue a group work * @group: Group to queue the work for. @@ -730,6 +758,9 @@ struct panthor_job { /** @queue_idx: Index of the queue inside @group. */ u32 queue_idx; + /** @ringbuf_idx: Index of the ringbuffer inside @queue. */ + u32 ringbuf_idx; + /** @call_info: Information about the userspace command stream call. */ struct { /** @start: GPU address of the userspace command stream. */ @@ -844,7 +875,7 @@ static void group_release_work(struct work_struct *work) panthor_kernel_bo_destroy(group->suspend_buf); panthor_kernel_bo_destroy(group->protm_suspend_buf); - panthor_kernel_bo_destroy(group->syncobjs); + panthor_kernel_bo_destroy(group->syncobjs.bo); panthor_vm_put(group->vm); kfree(group); @@ -1969,8 +2000,6 @@ tick_ctx_init(struct panthor_scheduler *sched, } } -#define NUM_INSTRS_PER_SLOT16 - static void group_term_post_processing(struct panthor_group *group) { @@ -2007,7 +2036,7 @@ group_term_post_processing(struct panthor_group *group) spin_unlock(>fence_ctx.lock); /* Manually update the syncobj seqno to unblock waiters. */ - syncobj = group->syncobjs->kmap + (i * sizeof(*syncobj)); + syncobj = group->syncobjs.bo->kmap + (i * sizeof(*syncobj)); syncobj->status = ~0; syncobj->seqno = atomic64_read(>fence_ctx.seqno); sched_queue_work(group->ptdev->scheduler, sync_upd); @@ -2780,7 +2809,7 @@ static void group_sync_upd_work(struct work_struct *work) if (!queue) continue; - syncobj = group->syncobjs->kmap + (queue_idx * sizeof(*syncobj)); + syncobj = group->syncobjs.bo->kmap + (queue_idx * sizeof(*syncobj)); spin_lock(>fence_ctx.lock); list_for_each_entry_safe(job, job_tmp, >fence_ctx.in_flight_jobs, node) { @@ -2815,11 +2844,17 @@ queue_run_job(struct drm_sched_job *sched_job) struct panthor_scheduler *sched = ptde
[PATCH v3 4/7] drm/panthor: add sysfs knob for enabling job profiling
Just like it is already present in Panfrost, this commit introduces a DRM device sysfs file that lets UM control the job accounting status in the device. The present commit only brings in the sysfs knob and also hides the cycles and engine fdinfo tags when it's disabled, but leveraging it for job accounting will be the matter of a later commit. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_device.h | 1 + drivers/gpu/drm/panthor/panthor_drv.c| 46 +--- 2 files changed, 43 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index 8a0260a7b90a..c3ec1e31f8b7 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -163,6 +163,7 @@ struct panthor_device { struct page *dummy_latest_flush; } pm; + bool profile_mode; unsigned long current_frequency; unsigned long fast_rate; }; diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index 6d25385e02a1..a2876310856f 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -1360,12 +1360,14 @@ static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev, struct panthor_file *pfile, struct drm_printer *p) { + if (ptdev->profile_mode) { #ifdef CONFIG_ARM_ARCH_TIMER - drm_printf(p, "drm-engine-panthor:\t%llu ns\n", - DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC), - arch_timer_get_cntfrq())); + drm_printf(p, "drm-engine-panthor:\t%llu ns\n", + DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC), + arch_timer_get_cntfrq())); #endif - drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles); + drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles); + } drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate); drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", ptdev->current_frequency); } @@ -1446,6 +1448,41 @@ static void panthor_remove(struct platform_device *pdev) panthor_device_unplug(ptdev); } +static ssize_t profiling_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct panthor_device *ptdev = dev_get_drvdata(dev); + + return sysfs_emit(buf, "%d\n", ptdev->profile_mode); +} + +static ssize_t profiling_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + struct panthor_device *ptdev = dev_get_drvdata(dev); + bool value; + int err; + + err = kstrtobool(buf, ); + if (err) + return err; + + ptdev->profile_mode = value; + + return len; +} + +static DEVICE_ATTR_RW(profiling); + +static struct attribute *panthor_attrs[] = { + _attr_profiling.attr, + NULL, +}; + +ATTRIBUTE_GROUPS(panthor); + static const struct of_device_id dt_match[] = { { .compatible = "rockchip,rk3588-mali" }, { .compatible = "arm,mali-valhall-csf" }, @@ -1465,6 +1502,7 @@ static struct platform_driver panthor_driver = { .name = "panthor", .pm = pm_ptr(_pm_ops), .of_match_table = dt_match, + .dev_groups = panthor_groups, }, }; -- 2.45.1
[PATCH v3 3/7] drm/panthor: enable fdinfo for memory stats
Implement drm object's status callback. Also, we consider a PRIME imported BO to be resident if its matching dma_buf has an open attachment, which means its backing storage had already been allocated. Signed-off-by: Adrián Larumbe Reviewed-by: Liviu Dudau --- drivers/gpu/drm/panthor/panthor_gem.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c index 38f560864879..c60b599665d8 100644 --- a/drivers/gpu/drm/panthor/panthor_gem.c +++ b/drivers/gpu/drm/panthor/panthor_gem.c @@ -145,6 +145,17 @@ panthor_gem_prime_export(struct drm_gem_object *obj, int flags) return drm_gem_prime_export(obj, flags); } +static enum drm_gem_object_status panthor_gem_status(struct drm_gem_object *obj) +{ + struct panthor_gem_object *bo = to_panthor_bo(obj); + enum drm_gem_object_status res = 0; + + if (bo->base.base.import_attach || bo->base.pages) + res |= DRM_GEM_OBJECT_RESIDENT; + + return res; +} + static const struct drm_gem_object_funcs panthor_gem_funcs = { .free = panthor_gem_free_object, .print_info = drm_gem_shmem_object_print_info, @@ -154,6 +165,7 @@ static const struct drm_gem_object_funcs panthor_gem_funcs = { .vmap = drm_gem_shmem_object_vmap, .vunmap = drm_gem_shmem_object_vunmap, .mmap = panthor_gem_mmap, + .status = panthor_gem_status, .export = panthor_gem_prime_export, .vm_ops = _gem_shmem_vm_ops, }; -- 2.45.1
[PATCH v3 0/7] Support fdinfo runtime and memory stats on Panthor
This patch series enables userspace utilities like gputop and nvtop to query a render context's fdinfo file and figure out rates of engine and memory utilisation. Previous discussion can be found at https://lore.kernel.org/dri-devel/20240423213240.91412-1-adrian.laru...@collabora.com/ Changelog: v3: - Fixed some nits and removed useless bounds check in panthor_sched.c - Added support for sysfs profiling knob and optional job accounting - Added new patches for calculating size of internal BO's v2: - Split original first patch in two, one for FW CS cycle and timestamp calculations and job accounting memory management, and a second one that enables fdinfo. - Moved NUM_INSTRS_PER_SLOT to the file prelude - Removed nelem variable from the group's struct definition. - Precompute size of group's syncobj BO to avoid code duplication. - Some minor nits. Adrián Larumbe (7): drm/panthor: introduce job cycle and timestamp accounting drm/panthor: add DRM fdinfo support drm/panthor: enable fdinfo for memory stats drm/panthor: add sysfs knob for enabling job profiling drm/panthor: support job accounting drm/drm_file: add display of driver's internal memory size drm/panthor: register size of internal objects through fdinfo Documentation/gpu/drm-usage-stats.rst | 4 + drivers/gpu/drm/drm_file.c| 9 +- drivers/gpu/drm/msm/msm_drv.c | 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 2 +- drivers/gpu/drm/panthor/panthor_devfreq.c | 10 + drivers/gpu/drm/panthor/panthor_device.c | 2 + drivers/gpu/drm/panthor/panthor_device.h | 21 ++ drivers/gpu/drm/panthor/panthor_drv.c | 83 +- drivers/gpu/drm/panthor/panthor_fw.c | 16 +- drivers/gpu/drm/panthor/panthor_fw.h | 5 +- drivers/gpu/drm/panthor/panthor_gem.c | 67 - drivers/gpu/drm/panthor/panthor_gem.h | 16 +- drivers/gpu/drm/panthor/panthor_heap.c| 23 +- drivers/gpu/drm/panthor/panthor_heap.h| 6 +- drivers/gpu/drm/panthor/panthor_mmu.c | 8 +- drivers/gpu/drm/panthor/panthor_mmu.h | 3 +- drivers/gpu/drm/panthor/panthor_sched.c | 304 +++--- include/drm/drm_file.h| 7 +- 18 files changed, 522 insertions(+), 66 deletions(-) base-commit: 310ec03841a36e3f45fb528f0dfdfe5b9e84b037 -- 2.45.1
[PATCH v4 2/3] drm/lima: Fix dma_resv deadlock at drm object pin time
Commit a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") moved locking the DRM object's dma reservation to drm_gem_pin(), but Lima's pin callback kept calling drm_gem_shmem_pin, which also tries to lock the same dma_resv, leading to a double lock situation. As was already done for Panfrost in the previous commit, fix it by replacing drm_gem_shmem_pin() with its locked variant. Cc: Thomas Zimmermann Cc: Dmitry Osipenko Cc: Boris Brezillon Cc: Steven Price Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/lima/lima_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c index 7ea244d876ca..9bb997dbb4b9 100644 --- a/drivers/gpu/drm/lima/lima_gem.c +++ b/drivers/gpu/drm/lima/lima_gem.c @@ -185,7 +185,7 @@ static int lima_gem_pin(struct drm_gem_object *obj) if (bo->heap_size) return -EINVAL; - return drm_gem_shmem_pin(>base); + return drm_gem_shmem_pin_locked(>base); } static int lima_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map) -- 2.45.1
[PATCH v4 3/3] drm/gem-shmem: Add import attachment warning to locked pin function
Commit ec144244a43f ("drm/gem-shmem: Acquire reservation lock in GEM pin/unpin callbacks") moved locking DRM object's dma reservation to drm_gem_shmem_object_pin, and made drm_gem_shmem_pin_locked public, so we need to make sure the non-NULL check warning is also added to the latter. Cc: Thomas Zimmermann Cc: Dmitry Osipenko Cc: Boris Brezillon Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/drm_gem_shmem_helper.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index 13bcdbfd..ad5d9f704e15 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -233,6 +233,8 @@ int drm_gem_shmem_pin_locked(struct drm_gem_shmem_object *shmem) dma_resv_assert_held(shmem->base.resv); + drm_WARN_ON(shmem->base.dev, shmem->base.import_attach); + ret = drm_gem_shmem_get_pages(shmem); return ret; -- 2.45.1
[PATCH v4 0/3] drm: Fix dma_resv deadlock at drm object pin time
This is v4 of https://lore.kernel.org/lkml/20240521181817.097af...@collabora.com/T/ The goal of this patch series is fixing a deadlock upon locking the dma reservation of a DRM gem object when pinning it, at a prime import operation. Changelog: v3: - Split driver fixes into separate commits for Panfrost and Lima - Make drivers call drm_gem_shmem_pin_locked instead of drm_gem_shmem_object_pin - Improved commit message for first patch to explain why dma resv locking in the pin callback is no longer necessary. v2: - Removed comment explaining reason why an already-locked pin function replaced the locked variant inside Panfrost's object pin callback. - Moved already-assigned attachment warning into generic already-locked gem object pin function Adrián Larumbe (3): drm/panfrost: Fix dma_resv deadlock at drm object pin time drm/lima: Fix dma_resv deadlock at drm object pin time drm/gem-shmem: Add import attachment warning to locked pin function drivers/gpu/drm/drm_gem_shmem_helper.c | 2 ++ drivers/gpu/drm/lima/lima_gem.c | 2 +- drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) base-commit: 7acacca1b157fcb258cfd781603425f73bc7370b -- 2.45.1
[PATCH v4 1/3] drm/panfrost: Fix dma_resv deadlock at drm object pin time
When Panfrost must pin an object that is being prepared a dma-buf attachment for on behalf of another driver, the core drm gem object pinning code already takes a lock on the object's dma reservation. However, Panfrost GEM object's pinning callback would eventually try taking the lock on the same dma reservation when delegating pinning of the object onto the shmem subsystem, which led to a deadlock. This can be shown by enabling CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which throws the following recursive locking situation: weston/3440 is trying to acquire lock: 00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_pin+0x34/0xb8 [drm_shmem_helper] but task is already holding lock: 00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_pin+0x2c/0x80 [drm] Fix it by replacing drm_gem_shmem_pin with its locked version, as the lock had already been taken by drm_gem_pin(). Cc: Thomas Zimmermann Cc: Dmitry Osipenko Cc: Boris Brezillon Cc: Steven Price Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index d47b40b82b0b..8e0ff3efede7 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -192,7 +192,7 @@ static int panfrost_gem_pin(struct drm_gem_object *obj) if (bo->is_heap) return -EINVAL; - return drm_gem_shmem_pin(>base); + return drm_gem_shmem_pin_locked(>base); } static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj) -- 2.45.1
Re: [PATCH v3 0/2] drm: Fix dma_resv deadlock at drm object pin time
Hi Boris and Thomas, On 02.05.2024 14:18, Thomas Zimmermann wrote: > Hi > > Am 02.05.24 um 14:00 schrieb Boris Brezillon: > > On Thu, 2 May 2024 13:59:41 +0200 > > Boris Brezillon wrote: > > > > > Hi Thomas, > > > > > > On Thu, 2 May 2024 13:51:16 +0200 > > > Thomas Zimmermann wrote: > > > > > > > Hi, > > > > > > > > ignoring my r-b on patch 1, I'd like to rethink the current patches in > > > > general. > > > > > > > > I think drm_gem_shmem_pin() should become the locked version of _pin(), > > > > so that drm_gem_shmem_object_pin() can call it directly. The existing > > > > _pin_unlocked() would not be needed any longer. Same for the _unpin() > > > > functions. This change would also fix the consistency with the semantics > > > > of the shmem _vmap() functions, which never take reservation locks. > > > > > > > > There are only two external callers of drm_gem_shmem_pin(): the test > > > > case and panthor. These assume that drm_gem_shmem_pin() acquires the > > > > reservation lock. The test case should likely call drm_gem_pin() > > > > instead. That would acquire the reservation lock and the test would > > > > validate that shmem's pin helper integrates well into the overall GEM > > > > framework. The way panthor uses drm_gem_shmem_pin() looks wrong to me. > > > > For now, it could receive a wrapper that takes the lock and that's it. > > > I do agree that the current inconsistencies in the naming is > > > troublesome (sometimes _unlocked, sometimes _locked, with the version > > > without any suffix meaning either _locked or _unlocked depending on > > > what the suffixed version does), and that's the very reason I asked > > > Dmitry to address that in his shrinker series [1]. So, ideally I'd > > > prefer if patches from Dmitry's series were applied instead of > > > trying to fix that here (IIRC, we had an ack from Maxime). > > With the link this time :-). > > > > [1]https://lore.kernel.org/lkml/20240105184624.508603-1-dmitry.osipe...@collabora.com/T/ > > Thanks. I remember these patches. Somehow I thought they would have been > merged already. I wasn't super happy about the naming changes in patch 5, > because the names of the GEM object callbacks do no longer correspond with > their implementations. But anyway. > > If we go that direction, we should here simply push drm_gem_shmem_pin() and > drm_gem_shmem_unpin() into panthor and update the shmem tests with > drm_gem_pin(). Panfrost and lima would call drm_gem_shmem_pin_locked(). IMHO > we should not promote the use of drm_gem_shmem_object_*() functions, as they > are meant to be callbacks for struct drm_gem_object_funcs. (Auto-generating > them would be nice.) I'll be doing this in the next patch series iteration, casting the pin function's drm object parameter to an shmem object. Also for the sake of leaving things in a consistent state, and against Boris' advice, I think I'll leave the drm WARN statement inside drm_gem_shmem_pin_locked. I guess even though Dmitry's working on it, rebasing his work on top of this minor change shouldn't be an issue. Cheers, Adrian Larumbe > Best regards > Thomas > > > > > > > Regards, > > > > > > Boris
Re: [PATCH v2 3/3] drm/panthor: Enable fdinfo for memory stats
On 24.04.2024 18:34, Liviu Dudau wrote: > Hello, > > On Tue, Apr 23, 2024 at 10:32:36PM +0100, Adrián Larumbe wrote: > > When vm-binding an already-created BO, the entirety of its virtual size is > > then backed by system memory, so its RSS is always the same as its virtual > > size. > > How is that relevant to this patch? Or to put it differently: how are your > words describing your code change here? I think I wrote this as a throw-back to the time when we handled RSS calculations for Panfrost objects, because heap BO's would be mapped on demand at every page fault. I understand that without mention of this the remark seems out of context, so depending on your taste I can either expand the message to underline this, or perhaps drop it altogether. I think I'd rather go for the latter, since the fact that panthor_gem_funcs includes no binding for drm_gem_object_funcs::rss() should be enough of a hint at this. > > > > Also, we consider a PRIME imported BO to be resident if its matching > > dma_buf has an open attachment, which means its backing storage had already > > been allocated. > > Reviewed-by: Liviu Dudau > > Best regards, > Liviu > > > > > Signed-off-by: Adrián Larumbe > > --- > > drivers/gpu/drm/panthor/panthor_gem.c | 12 > > 1 file changed, 12 insertions(+) > > > > diff --git a/drivers/gpu/drm/panthor/panthor_gem.c > > b/drivers/gpu/drm/panthor/panthor_gem.c > > index d6483266d0c2..386c0dfeeb5f 100644 > > --- a/drivers/gpu/drm/panthor/panthor_gem.c > > +++ b/drivers/gpu/drm/panthor/panthor_gem.c > > @@ -143,6 +143,17 @@ panthor_gem_prime_export(struct drm_gem_object *obj, > > int flags) > > return drm_gem_prime_export(obj, flags); > > } > > > > +static enum drm_gem_object_status panthor_gem_status(struct drm_gem_object > > *obj) > > +{ > > + struct panthor_gem_object *bo = to_panthor_bo(obj); > > + enum drm_gem_object_status res = 0; > > + > > + if (bo->base.base.import_attach || bo->base.pages) > > + res |= DRM_GEM_OBJECT_RESIDENT; > > + > > + return res; > > +} > > + > > static const struct drm_gem_object_funcs panthor_gem_funcs = { > > .free = panthor_gem_free_object, > > .print_info = drm_gem_shmem_object_print_info, > > @@ -152,6 +163,7 @@ static const struct drm_gem_object_funcs > > panthor_gem_funcs = { > > .vmap = drm_gem_shmem_object_vmap, > > .vunmap = drm_gem_shmem_object_vunmap, > > .mmap = panthor_gem_mmap, > > + .status = panthor_gem_status, > > .export = panthor_gem_prime_export, > > .vm_ops = _gem_shmem_vm_ops, > > }; > > -- > > 2.44.0 > > Adrian Larumbe
Re: [PATCH v2 1/1] drm: Add ioctl for querying a DRM device's list of open client PIDs
Hi Daniel, On 02.05.2024 10:09, Daniel Vetter wrote: > On Wed, May 01, 2024 at 07:50:43PM +0100, Adrián Larumbe wrote: > > Up to this day, all fdinfo-based GPU profilers must traverse the entire > > /proc directory structure to find open DRM clients with fdinfo file > > descriptors. This is inefficient and time-consuming. > > > > This patch adds a new DRM ioctl that allows users to obtain a list of PIDs > > for clients who have opened the DRM device. Output from the ioctl isn't > > human-readable, and it's meant to be retrieved only by GPU profilers like > > gputop and nvtop. > > > > Cc: Rob Clark > > Cc: Tvrtko Ursulin > > Signed-off-by: Adrián Larumbe > > --- > > drivers/gpu/drm/drm_internal.h | 1 + > > drivers/gpu/drm/drm_ioctl.c| 89 ++ > > include/uapi/drm/drm.h | 7 +++ > > 3 files changed, 97 insertions(+) > > > > diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h > > index 690505a1f7a5..6f78954cae16 100644 > > --- a/drivers/gpu/drm/drm_internal.h > > +++ b/drivers/gpu/drm/drm_internal.h > > @@ -243,6 +243,7 @@ static inline void drm_debugfs_encoder_remove(struct > > drm_encoder *encoder) > > drm_ioctl_t drm_version; > > drm_ioctl_t drm_getunique; > > drm_ioctl_t drm_getclient; > > +drm_ioctl_t drm_getclients; > > > > /* drm_syncobj.c */ > > void drm_syncobj_open(struct drm_file *file_private); > > diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c > > index e368fc084c77..da7057376581 100644 > > --- a/drivers/gpu/drm/drm_ioctl.c > > +++ b/drivers/gpu/drm/drm_ioctl.c > > @@ -207,6 +207,93 @@ int drm_getclient(struct drm_device *dev, void *data, > > } > > } > > > > +/* > > + * Get list of client PIDs who have opened a DRM file > > + * > > + * \param dev DRM device we are querying > > + * \param data IOCTL command input. > > + * \param file_priv DRM file private. > > + * > > + * \return zero on success or a negative number on failure. > > + * > > + * Traverses list of open clients for the given DRM device, and > > + * copies them into userpace as an array of PIDs > > + */ > > +int drm_getclients(struct drm_device *dev, void *data, > > + struct drm_file *file_priv) > > + > > +{ > > + struct drm_get_clients *get_clients = data; > > + ssize_t size = get_clients->len; > > + char __user *pid_buf; > > + ssize_t offset = 0; > > + int ret = 0; > > + > > + /* > > +* We do not want to show clients of display only devices so > > +* as to avoid confusing UM GPU profilers > > +*/ > > + if (!dev->render) { > > + get_clients->len = 0; > > + return 0; > > + } > > + > > + /* > > +* An input size of zero means UM wants to know the size of the PID > > buffer > > +* We round it up to the nearest multiple of the page size so that we > > can have > > +* some spare headroom in case more clients came in between successive > > calls > > +* of this ioctl, and also to simplify parsing of the PIDs buffer, > > because > > +* sizeof(pid_t) will hopefully always divide PAGE_SIZE > > +*/ > > + if (size == 0) { > > + get_clients->len = > > + roundup(atomic_read(>open_count) * sizeof(pid_t), > > PAGE_SIZE); > > + return 0; > > + } > > + > > + pid_buf = (char *)(void *)get_clients->user_data; > > + > > + if (!pid_buf) > > + return -EINVAL; > > + > > + mutex_lock(>filelist_mutex); > > + list_for_each_entry_reverse(file_priv, >filelist, lhead) { > > + pid_t pid_num; > > + > > + if ((size - offset) < sizeof(pid_t)) > > + break; > > + > > + rcu_read_lock(); > > + pid_num = pid_vnr(rcu_dereference(file_priv->pid)); > > + rcu_read_unlock(); > > + > > + /* We do not want to return the profiler's PID */ > > + if (pid_vnr(task_tgid(current)) == pid_num) > > + continue; > > + > > + ret = copy_to_user(pid_buf + offset, _num, sizeof(pid_t)); > > + if (ret) > > + break; > > + > > + offset += sizeof(pid_t); > > + } > > + mutex_unlock(>filelist_mutex); > > + > > + if (ret) > > + retu
[PATCH v2 1/1] drm: Add ioctl for querying a DRM device's list of open client PIDs
Up to this day, all fdinfo-based GPU profilers must traverse the entire /proc directory structure to find open DRM clients with fdinfo file descriptors. This is inefficient and time-consuming. This patch adds a new DRM ioctl that allows users to obtain a list of PIDs for clients who have opened the DRM device. Output from the ioctl isn't human-readable, and it's meant to be retrieved only by GPU profilers like gputop and nvtop. Cc: Rob Clark Cc: Tvrtko Ursulin Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/drm_internal.h | 1 + drivers/gpu/drm/drm_ioctl.c| 89 ++ include/uapi/drm/drm.h | 7 +++ 3 files changed, 97 insertions(+) diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 690505a1f7a5..6f78954cae16 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -243,6 +243,7 @@ static inline void drm_debugfs_encoder_remove(struct drm_encoder *encoder) drm_ioctl_t drm_version; drm_ioctl_t drm_getunique; drm_ioctl_t drm_getclient; +drm_ioctl_t drm_getclients; /* drm_syncobj.c */ void drm_syncobj_open(struct drm_file *file_private); diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index e368fc084c77..da7057376581 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -207,6 +207,93 @@ int drm_getclient(struct drm_device *dev, void *data, } } +/* + * Get list of client PIDs who have opened a DRM file + * + * \param dev DRM device we are querying + * \param data IOCTL command input. + * \param file_priv DRM file private. + * + * \return zero on success or a negative number on failure. + * + * Traverses list of open clients for the given DRM device, and + * copies them into userpace as an array of PIDs + */ +int drm_getclients(struct drm_device *dev, void *data, + struct drm_file *file_priv) + +{ + struct drm_get_clients *get_clients = data; + ssize_t size = get_clients->len; + char __user *pid_buf; + ssize_t offset = 0; + int ret = 0; + + /* +* We do not want to show clients of display only devices so +* as to avoid confusing UM GPU profilers +*/ + if (!dev->render) { + get_clients->len = 0; + return 0; + } + + /* +* An input size of zero means UM wants to know the size of the PID buffer +* We round it up to the nearest multiple of the page size so that we can have +* some spare headroom in case more clients came in between successive calls +* of this ioctl, and also to simplify parsing of the PIDs buffer, because +* sizeof(pid_t) will hopefully always divide PAGE_SIZE +*/ + if (size == 0) { + get_clients->len = + roundup(atomic_read(>open_count) * sizeof(pid_t), PAGE_SIZE); + return 0; + } + + pid_buf = (char *)(void *)get_clients->user_data; + + if (!pid_buf) + return -EINVAL; + + mutex_lock(>filelist_mutex); + list_for_each_entry_reverse(file_priv, >filelist, lhead) { + pid_t pid_num; + + if ((size - offset) < sizeof(pid_t)) + break; + + rcu_read_lock(); + pid_num = pid_vnr(rcu_dereference(file_priv->pid)); + rcu_read_unlock(); + + /* We do not want to return the profiler's PID */ + if (pid_vnr(task_tgid(current)) == pid_num) + continue; + + ret = copy_to_user(pid_buf + offset, _num, sizeof(pid_t)); + if (ret) + break; + + offset += sizeof(pid_t); + } + mutex_unlock(>filelist_mutex); + + if (ret) + return -EFAULT; + + if ((size - offset) >= sizeof(pid_t)) { + pid_t pid_zero = 0; + + ret = copy_to_user(pid_buf + offset, + _zero, sizeof(pid_t)); + if (ret) + return -EFAULT; + } + + return 0; +} + /* * Get statistics information. * @@ -672,6 +759,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_IOCTL_DEF(DRM_IOCTL_MODE_LIST_LESSEES, drm_mode_list_lessees_ioctl, DRM_MASTER), DRM_IOCTL_DEF(DRM_IOCTL_MODE_GET_LEASE, drm_mode_get_lease_ioctl, DRM_MASTER), DRM_IOCTL_DEF(DRM_IOCTL_MODE_REVOKE_LEASE, drm_mode_revoke_lease_ioctl, DRM_MASTER), + + DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENTS, drm_getclients, DRM_RENDER_ALLOW), }; #define DRM_CORE_IOCTL_COUNT ARRAY_SIZE(drm_ioctls) diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 16122819edfe..c47aa9de51ab 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -1024,6 +1024,11 @@ struct drm_crtc_queue_sequence { __u64 user_data;/* user d
[PATCH v2 0/1] drm: Add ioctl for querying a DRM device's list of open client PIDs
This is v2 of the patch being discussed at https://lore.kernel.org/dri-devel/20240403182951.724488-1-adrian.laru...@collabora.com/ In the original patch, a DRM device sysfs attribute file was chosen as the interface for fetching the list of active client PIDs. That came with a hosts of problems: - Normal device attributes can only send back up to a page worth of data, which might be not enough if many clients are opening the DRM device. - The binary attribute interface is meant for immutable virtual files, but the list of active PIDs can grow and shrink between successive calls of show(). This led me to believe sysfs is not the right tool for the job, so switched over to a custom DRM ioctl that does the same thing. In order to test this patch, one can use WIP branches for both libdrm and igt at: https://gitlab.freedesktop.org/larumbe/igt-gpu-tools/-/tree/drm-clients-ioctl?ref_type=heads https://gitlab.freedesktop.org/larumbe/drm/-/tree/drm-clients-ioctl?ref_type=heads I've only tested it with gputop, but intel-gputop should work also. Adrián Larumbe (1): drm: Add ioctl for querying a DRM device's list of open client PIDs drivers/gpu/drm/drm_internal.h | 1 + drivers/gpu/drm/drm_ioctl.c| 89 ++ include/uapi/drm/drm.h | 7 +++ 3 files changed, 97 insertions(+) -- 2.44.0
[PATCH v3 2/2] drm/gem-shmem: Add import attachment warning to locked pin function
Commit ec144244a43f ("drm/gem-shmem: Acquire reservation lock in GEM pin/unpin callbacks") moved locking DRM object's dma reservation to drm_gem_shmem_object_pin, and made drm_gem_shmem_pin_locked public, so we need to make sure the non-NULL check warning is also added to the latter. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/drm_gem_shmem_helper.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index 13bcdbfd..ad5d9f704e15 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -233,6 +233,8 @@ int drm_gem_shmem_pin_locked(struct drm_gem_shmem_object *shmem) dma_resv_assert_held(shmem->base.resv); + drm_WARN_ON(shmem->base.dev, shmem->base.import_attach); + ret = drm_gem_shmem_get_pages(shmem); return ret; -- 2.44.0
[PATCH v3 0/2] drm: Fix dma_resv deadlock at drm object pin time
This is v3 of https://lore.kernel.org/dri-devel/20240424090429.57de7...@collabora.com/ The goal of this patch series is fixing a deadlock upon locking the dma reservation of a DRM gem object when pinning it, at a prime import operation. Changes from v2: - Removed comment explaining reason why an already-locked pin function replaced the locked variant inside Panfrost's object pin callback. - Moved already-assigned attachment warning into generic already-locked gem object pin function Adrián Larumbe (2): drm/panfrost: Fix dma_resv deadlock at drm object pin time drm/gem-shmem: Add import attachment warning to locked pin function drivers/gpu/drm/drm_gem_shmem_helper.c | 2 ++ drivers/gpu/drm/lima/lima_gem.c | 2 +- drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) base-commit: 75b68f22e39aafb22f3d8e3071e1aba73560788c -- 2.44.0
[PATCH v3 1/2] drm/panfrost: Fix dma_resv deadlock at drm object pin time
When Panfrost must pin an object that is being prepared a dma-buf attachment for on behalf of another driver, the core drm gem object pinning code already takes a lock on the object's dma reservation. However, Panfrost GEM object's pinning callback would eventually try taking the lock on the same dma reservation when delegating pinning of the object onto the shmem subsystem, which led to a deadlock. This can be shown by enabling CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which throws the following recursive locking situation: weston/3440 is trying to acquire lock: 00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_pin+0x34/0xb8 [drm_shmem_helper] but task is already holding lock: 00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_pin+0x2c/0x80 [drm] Fix it by assuming the object's reservation had already been locked by the time we reach panfrost_gem_pin. Do the same thing for the Lima driver, as it most likely suffers from the same issue. Cc: Thomas Zimmermann Cc: Dmitry Osipenko Cc: Boris Brezillon Cc: Steven Price Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/lima/lima_gem.c | 2 +- drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c index 7ea244d876ca..c4e0f9faaa47 100644 --- a/drivers/gpu/drm/lima/lima_gem.c +++ b/drivers/gpu/drm/lima/lima_gem.c @@ -185,7 +185,7 @@ static int lima_gem_pin(struct drm_gem_object *obj) if (bo->heap_size) return -EINVAL; - return drm_gem_shmem_pin(>base); + return drm_gem_shmem_object_pin(obj); } static int lima_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index d47b40b82b0b..f268bd5c2884 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -192,7 +192,7 @@ static int panfrost_gem_pin(struct drm_gem_object *obj) if (bo->is_heap) return -EINVAL; - return drm_gem_shmem_pin(>base); + return drm_gem_shmem_object_pin(obj); } static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj) -- 2.44.0
Re: [PATCH v2 3/4] drm/panthor: Relax the constraints on the tiler chunk size
Hi Boris, On 30.04.2024 13:28, Boris Brezillon wrote: > The field used to store the chunk size if 12 bits wide, and the encoding > is chunk_size = chunk_header.chunk_size << 12, which gives us a > theoretical [4k:8M] range. This range is further limited by > implementation constraints, and all known implementations seem to > impose a [128k:8M] range, so do the same here. > > We also relax the power-of-two constraint, which doesn't seem to > exist on v10. This will allow userspace to fine-tune initial/max > tiler memory on memory-constrained devices. > > v2: > - Turn the power-of-two constraint into a page-aligned constraint to allow > fine-tune of the initial/max heap memory size > - Fix the panthor_heap_create() kerneldoc > > Fixes: 9cca48fa4f89 ("drm/panthor: Add the heap logical block") > Signed-off-by: Boris Brezillon > --- > drivers/gpu/drm/panthor/panthor_heap.c | 8 > include/uapi/drm/panthor_drm.h | 6 +- > 2 files changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_heap.c > b/drivers/gpu/drm/panthor/panthor_heap.c > index 3be86ec383d6..683bb94761bc 100644 > --- a/drivers/gpu/drm/panthor/panthor_heap.c > +++ b/drivers/gpu/drm/panthor/panthor_heap.c > @@ -253,8 +253,8 @@ int panthor_heap_destroy(struct panthor_heap_pool *pool, > u32 handle) > * @pool: Pool to instantiate the heap context from. > * @initial_chunk_count: Number of chunk allocated at initialization time. > * Must be at least 1. > - * @chunk_size: The size of each chunk. Must be a power of two between 256k > - * and 2M. > + * @chunk_size: The size of each chunk. Must be page-aligned and lie in the > + * [128k:2M] range. Probably a typo, but I guess this should be [128k:8M] ? > * @max_chunks: Maximum number of chunks that can be allocated. > * @target_in_flight: Maximum number of in-flight render passes. > * @heap_ctx_gpu_va: Pointer holding the GPU address of the allocated heap > @@ -284,8 +284,8 @@ int panthor_heap_create(struct panthor_heap_pool *pool, > if (initial_chunk_count > max_chunks) > return -EINVAL; > > - if (hweight32(chunk_size) != 1 || > - chunk_size < SZ_256K || chunk_size > SZ_2M) > + if (!IS_ALIGNED(chunk_size, PAGE_SIZE) || > + chunk_size < SZ_128K || chunk_size > SZ_8M) > return -EINVAL; > > down_read(>lock); > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h > index 5db80a0682d5..b8220d2e698f 100644 > --- a/include/uapi/drm/panthor_drm.h > +++ b/include/uapi/drm/panthor_drm.h > @@ -898,7 +898,11 @@ struct drm_panthor_tiler_heap_create { > /** @initial_chunk_count: Initial number of chunks to allocate. Must be > at least one. */ > __u32 initial_chunk_count; > > - /** @chunk_size: Chunk size. Must be a power of two at least 256KB > large. */ > + /** > + * @chunk_size: Chunk size. > + * > + * Must be page-aligned and lie in the [128k:8M] range. > + */ > __u32 chunk_size; > > /** > -- > 2.44.0 Adrian Larumbe
Re: [PATCH] drm/sysfs: Add drm class-wide attribute to get active device clients
Hi Tvrtko, On 15.04.2024 13:50, Tvrtko Ursulin wrote: > > On 05/04/2024 18:59, Rob Clark wrote: > > On Wed, Apr 3, 2024 at 11:37 AM Adrián Larumbe > > wrote: > > > > > > Up to this day, all fdinfo-based GPU profilers must traverse the entire > > > /proc directory structure to find open DRM clients with fdinfo file > > > descriptors. This is inefficient and time-consuming. > > > > > > This patch adds a new device class attribute that will install a sysfs > > > file > > > per DRM device, which can be queried by profilers to get a list of PIDs > > > for > > > their open clients. This file isn't human-readable, and it's meant to be > > > queried only by GPU profilers like gputop and nvtop. > > > > > > Cc: Boris Brezillon > > > Cc: Tvrtko Ursulin > > > Cc: Christopher Healy > > > Signed-off-by: Adrián Larumbe > > > > It does seem like a good idea.. idk if there is some precedent to > > prefer binary vs ascii in sysfs, but having a way to avoid walking > > _all_ processes is a good idea. > > I naturally second that it is a needed feature, but I do not think binary > format is justified. AFAIR it should be used for things like hw/fw > standardised tables or firmware images, not when exporting a simple list of > PIDs. It also precludes easy shell/script access and the benefit of avoiding > parsing a short list is I suspect completely dwarfed by needing to parse all > the related fdinfo etc. I'd rather keep it as a binary file for the sake of easily parsing the number list on the client side, in gputop or nvtop. For textual access, there's already a debugfs file that presents the same information, so I thought it was best not to duplicate that functionality and restrict sysfs to serving the very specific use case of UM profilers having to access the DRM client list. I should mention I did something controversial here, which is a semantically binary attribute through the regular attribute interface. I guess if I keep it as a binary attribute in the end, I should switch over to the binary attribute API. Another reason why I implemented it as a binary file is that we can only send back at most a whole page. If a PID takes 4 bytes, that's usually 1024 clients at most, which is probably enough for any UM profiler, but will decrease even more if we turn it into an ASCII readable file. I did some research into sysfs binary attributes, and while some sources mention that it's often used for dumping or loading of driver FW, none of them claim it cannot be used for other purposes. > > > --- > > > drivers/gpu/drm/drm_internal.h | 2 +- > > > drivers/gpu/drm/drm_privacy_screen.c | 2 +- > > > drivers/gpu/drm/drm_sysfs.c | 89 ++-- > > > 3 files changed, 74 insertions(+), 19 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/drm_internal.h > > > b/drivers/gpu/drm/drm_internal.h > > > index 2215baef9a3e..9a399b03d11c 100644 > > > --- a/drivers/gpu/drm/drm_internal.h > > > +++ b/drivers/gpu/drm/drm_internal.h > > > @@ -145,7 +145,7 @@ bool drm_master_internal_acquire(struct drm_device > > > *dev); > > > void drm_master_internal_release(struct drm_device *dev); > > > > > > /* drm_sysfs.c */ > > > -extern struct class *drm_class; > > > +extern struct class drm_class; > > > > > > int drm_sysfs_init(void); > > > void drm_sysfs_destroy(void); > > > diff --git a/drivers/gpu/drm/drm_privacy_screen.c > > > b/drivers/gpu/drm/drm_privacy_screen.c > > > index 6cc39e30781f..2fbd24ba5818 100644 > > > --- a/drivers/gpu/drm/drm_privacy_screen.c > > > +++ b/drivers/gpu/drm/drm_privacy_screen.c > > > @@ -401,7 +401,7 @@ struct drm_privacy_screen > > > *drm_privacy_screen_register( > > > mutex_init(>lock); > > > BLOCKING_INIT_NOTIFIER_HEAD(>notifier_head); > > > > > > - priv->dev.class = drm_class; > > > + priv->dev.class = _class; > > > priv->dev.type = _privacy_screen_type; > > > priv->dev.parent = parent; > > > priv->dev.release = drm_privacy_screen_device_release; > > > diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c > > > index a953f69a34b6..56ca9e22c720 100644 > > > --- a/drivers/gpu/drm/drm_sysfs.c > > > +++ b/drivers/gpu/drm/drm_sysfs.c > > > @@ -58,8 +58,6 @@ static struct device_type drm_sysfs_device_connector = { > > > .name = "drm_connector", > > >
[PATCH v2] drm/panfrost: Fix dma_resv deadlock at drm object pin time
When Panfrost must pin an object that is being prepared a dma-buf attachment for on behalf of another driver, the core drm gem object pinning code already takes a lock on the object's dma reservation. However, Panfrost GEM object's pinning callback would eventually try taking the lock on the same dma reservation when delegating pinning of the object onto the shmem subsystem, which led to a deadlock. This can be shown by enabling CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which throws the following recursive locking situation: weston/3440 is trying to acquire lock: 00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_pin+0x34/0xb8 [drm_shmem_helper] but task is already holding lock: 00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_pin+0x2c/0x80 [drm] Fix it by assuming the object's reservation had already been locked by the time we reach panfrost_gem_pin. Do the same thing for the Lima driver, as it most likely suffers from the same issue. Cc: Thomas Zimmermann Cc: Dmitry Osipenko Cc: Boris Brezillon Cc: Steven Price Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") Reviewed-by: Boris Brezillon Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/lima/lima_gem.c | 9 +++-- drivers/gpu/drm/panfrost/panfrost_gem.c | 8 +++- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c index 7ea244d876ca..8a5bcf498ef6 100644 --- a/drivers/gpu/drm/lima/lima_gem.c +++ b/drivers/gpu/drm/lima/lima_gem.c @@ -184,8 +184,13 @@ static int lima_gem_pin(struct drm_gem_object *obj) if (bo->heap_size) return -EINVAL; - - return drm_gem_shmem_pin(>base); + /* +* Pinning can only happen in response to a prime attachment request +* from another driver, but dma reservation locking is already being +* handled by drm_gem_pin +*/ + drm_WARN_ON(obj->dev, obj->import_attach); + return drm_gem_shmem_object_pin(obj); } static int lima_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index d47b40b82b0b..e3fbcb020617 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -192,7 +192,13 @@ static int panfrost_gem_pin(struct drm_gem_object *obj) if (bo->is_heap) return -EINVAL; - return drm_gem_shmem_pin(>base); + /* +* Pinning can only happen in response to a prime attachment request +* from another driver, but dma reservation locking is already being +* handled by drm_gem_pin +*/ + drm_WARN_ON(obj->dev, obj->import_attach); + return drm_gem_shmem_object_pin(obj); } static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj) -- 2.44.0
[PATCH v2 3/3] drm/panthor: Enable fdinfo for memory stats
When vm-binding an already-created BO, the entirety of its virtual size is then backed by system memory, so its RSS is always the same as its virtual size. Also, we consider a PRIME imported BO to be resident if its matching dma_buf has an open attachment, which means its backing storage had already been allocated. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_gem.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c index d6483266d0c2..386c0dfeeb5f 100644 --- a/drivers/gpu/drm/panthor/panthor_gem.c +++ b/drivers/gpu/drm/panthor/panthor_gem.c @@ -143,6 +143,17 @@ panthor_gem_prime_export(struct drm_gem_object *obj, int flags) return drm_gem_prime_export(obj, flags); } +static enum drm_gem_object_status panthor_gem_status(struct drm_gem_object *obj) +{ + struct panthor_gem_object *bo = to_panthor_bo(obj); + enum drm_gem_object_status res = 0; + + if (bo->base.base.import_attach || bo->base.pages) + res |= DRM_GEM_OBJECT_RESIDENT; + + return res; +} + static const struct drm_gem_object_funcs panthor_gem_funcs = { .free = panthor_gem_free_object, .print_info = drm_gem_shmem_object_print_info, @@ -152,6 +163,7 @@ static const struct drm_gem_object_funcs panthor_gem_funcs = { .vmap = drm_gem_shmem_object_vmap, .vunmap = drm_gem_shmem_object_vunmap, .mmap = panthor_gem_mmap, + .status = panthor_gem_status, .export = panthor_gem_prime_export, .vm_ops = _gem_shmem_vm_ops, }; -- 2.44.0
[PATCH v2 2/3] drm/panthor: Add DRM fdinfo support
Drawing from the FW-calculated values in the previous commit, we can increase the numbers for an open file by collecting them from finished jobs when updating their group synchronisation objects. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_devfreq.c | 10 + drivers/gpu/drm/panthor/panthor_device.h | 11 ++ drivers/gpu/drm/panthor/panthor_drv.c | 31 +++ drivers/gpu/drm/panthor/panthor_sched.c | 46 +++ 4 files changed, 98 insertions(+) diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c b/drivers/gpu/drm/panthor/panthor_devfreq.c index c6d3c327cc24..5eededaeade7 100644 --- a/drivers/gpu/drm/panthor/panthor_devfreq.c +++ b/drivers/gpu/drm/panthor/panthor_devfreq.c @@ -91,6 +91,7 @@ static int panthor_devfreq_get_dev_status(struct device *dev, spin_lock_irqsave(>lock, irqflags); panthor_devfreq_update_utilization(pdevfreq); + ptdev->current_frequency = status->current_frequency; status->total_time = ktime_to_ns(ktime_add(pdevfreq->busy_time, pdevfreq->idle_time)); @@ -130,6 +131,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) struct panthor_devfreq *pdevfreq; struct dev_pm_opp *opp; unsigned long cur_freq; + unsigned long freq = ULONG_MAX; int ret; pdevfreq = drmm_kzalloc(>base, sizeof(*ptdev->devfreq), GFP_KERNEL); @@ -204,6 +206,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev) dev_pm_opp_put(opp); + /* Find the fastest defined rate */ + opp = dev_pm_opp_find_freq_floor(dev, ); + if (IS_ERR(opp)) + return PTR_ERR(opp); + ptdev->fast_rate = freq; + + dev_pm_opp_put(opp); + /* * Setup default thresholds for the simple_ondemand governor. * The values are chosen based on experiments. diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index 2fdd671b38fd..b5b5dfe3cafe 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -162,6 +162,14 @@ struct panthor_device { */ struct page *dummy_latest_flush; } pm; + + unsigned long current_frequency; + unsigned long fast_rate; +}; + +struct panthor_gpu_usage { + u64 time; + u64 cycles; }; /** @@ -176,6 +184,9 @@ struct panthor_file { /** @groups: Scheduling group pool attached to this file. */ struct panthor_group_pool *groups; + + /** @stats: cycle and timestamp measures for job execution. */ + struct panthor_gpu_usage stats; }; int panthor_device_init(struct panthor_device *ptdev); diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index b8a84f26b3ef..6d25385e02a1 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -3,12 +3,17 @@ /* Copyright 2019 Linaro, Ltd., Rob Herring */ /* Copyright 2019 Collabora ltd. */ +#ifdef CONFIG_ARM_ARCH_TIMER +#include +#endif + #include #include #include #include #include #include +#include #include #include @@ -1351,6 +1356,30 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) return ret; } +static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev, + struct panthor_file *pfile, + struct drm_printer *p) +{ +#ifdef CONFIG_ARM_ARCH_TIMER + drm_printf(p, "drm-engine-panthor:\t%llu ns\n", + DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC), + arch_timer_get_cntfrq())); +#endif + drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles); + drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate); + drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", ptdev->current_frequency); +} + +static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file) +{ + struct drm_device *dev = file->minor->dev; + struct panthor_device *ptdev = container_of(dev, struct panthor_device, base); + + panthor_gpu_show_fdinfo(ptdev, file->driver_priv, p); + + drm_show_memory_stats(p, file); +} + static const struct file_operations panthor_drm_driver_fops = { .open = drm_open, .release = drm_release, @@ -1360,6 +1389,7 @@ static const struct file_operations panthor_drm_driver_fops = { .read = drm_read, .llseek = noop_llseek, .mmap = panthor_mmap, + .show_fdinfo = drm_show_fdinfo, }; #ifdef CONFIG_DEBUG_FS @@ -1378,6 +1408,7 @@ static const struct drm_driver panthor_drm_driver = { DRIVER_SYNCOBJ_TIMELINE | DRIVER_GEM_GPUVA, .open = panthor_
[PATCH v2 0/3] Support fdinfo runtime and memory stats on Panthor
This patch series enables userspace utilities like gputop and nvtop to query a render context's fdinfo file and figure out rates of engine and memory utilisation. Changelog: v2: - Split original first patch in two, one for FW CS cycle and timestamp calculations and job accounting memory management, and a second one that enables fdinfo. - Moved NUM_INSTRS_PER_SLOT to the file prelude - Removed nelem variable from the group's struct definition. - Precompute size of group's syncobj BO to avoid code duplication. - Some minor nits. Adrián Larumbe (3): drm/panthor: introduce job cycle and timestamp accounting drm/panthor: Add DRM fdinfo support drm/panthor: Enable fdinfo for memory stats drivers/gpu/drm/panthor/panthor_devfreq.c | 10 ++ drivers/gpu/drm/panthor/panthor_device.h | 11 ++ drivers/gpu/drm/panthor/panthor_drv.c | 31 drivers/gpu/drm/panthor/panthor_gem.c | 12 ++ drivers/gpu/drm/panthor/panthor_sched.c | 204 +++--- 5 files changed, 244 insertions(+), 24 deletions(-) base-commit: a6325ad47bc808aeb4c69ae36e0236c2c6d400b5 -- 2.44.0
[PATCH v2 1/3] drm/panthor: introduce job cycle and timestamp accounting
Enable calculations of job submission times in clock cycles and wall time. This is done by expanding the boilerplate command stream when running a job to include instructions that compute said times right before an after a user CS. Those numbers are stored in the queue's group's sync objects BO, right after them. Because the queues in a group might have a different number of slots, one must keep track of the overall slot tally when reckoning the offset of a queue's time sample structs, one for each slot. NUM_INSTRS_PER_SLOT had to be increased to 32 because of adding new FW instructions for storing and subtracting the cycle counter and timestamp register, and it must always remain a power of two. This commit is done in preparation for enabling DRM fdinfo support in the Panthor driver, which depends on the numbers calculated herein. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_sched.c | 158 1 file changed, 134 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c index b3a51a6de523..320dfa0388ba 100644 --- a/drivers/gpu/drm/panthor/panthor_sched.c +++ b/drivers/gpu/drm/panthor/panthor_sched.c @@ -93,6 +93,9 @@ #define MIN_CSGS 3 #define MAX_CSG_PRIO 0xf +#define NUM_INSTRS_PER_SLOT32 +#define SLOTSIZE (NUM_INSTRS_PER_SLOT * sizeof(u64)) + struct panthor_group; /** @@ -466,6 +469,9 @@ struct panthor_queue { */ struct list_head in_flight_jobs; } fence_ctx; + + /** @time_offset: Offset of panthor_job_times structs in group's syncobj bo. */ + unsigned long time_offset; }; /** @@ -580,7 +586,17 @@ struct panthor_group { * One sync object per queue. The position of the sync object is * determined by the queue index. */ - struct panthor_kernel_bo *syncobjs; + + struct { + /** @bo: Kernel BO holding the sync objects. */ + struct panthor_kernel_bo *bo; + + /** +* @times_offset: Beginning of panthor_job_times struct samples after +* the group's array of sync objects. +*/ + size_t times_offset; + } syncobjs; /** @state: Group state. */ enum panthor_group_state state; @@ -639,6 +655,18 @@ struct panthor_group { struct list_head wait_node; }; +struct panthor_job_times { + struct { + u64 before; + u64 after; + } cycles; + + struct { + u64 before; + u64 after; + } time; +}; + /** * group_queue_work() - Queue a group work * @group: Group to queue the work for. @@ -718,6 +746,9 @@ struct panthor_job { /** @queue_idx: Index of the queue inside @group. */ u32 queue_idx; + /** @ringbuf_idx: Index of the ringbuffer inside @queue. */ + u32 ringbuf_idx; + /** @call_info: Information about the userspace command stream call. */ struct { /** @start: GPU address of the userspace command stream. */ @@ -833,7 +864,7 @@ static void group_release_work(struct work_struct *work) panthor_kernel_bo_destroy(panthor_fw_vm(ptdev), group->suspend_buf); panthor_kernel_bo_destroy(panthor_fw_vm(ptdev), group->protm_suspend_buf); - panthor_kernel_bo_destroy(group->vm, group->syncobjs); + panthor_kernel_bo_destroy(group->vm, group->syncobjs.bo); panthor_vm_put(group->vm); kfree(group); @@ -1924,8 +1955,6 @@ tick_ctx_init(struct panthor_scheduler *sched, } } -#define NUM_INSTRS_PER_SLOT16 - static void group_term_post_processing(struct panthor_group *group) { @@ -1962,7 +1991,7 @@ group_term_post_processing(struct panthor_group *group) spin_unlock(>fence_ctx.lock); /* Manually update the syncobj seqno to unblock waiters. */ - syncobj = group->syncobjs->kmap + (i * sizeof(*syncobj)); + syncobj = group->syncobjs.bo->kmap + (i * sizeof(*syncobj)); syncobj->status = ~0; syncobj->seqno = atomic64_read(>fence_ctx.seqno); sched_queue_work(group->ptdev->scheduler, sync_upd); @@ -2729,7 +2758,7 @@ static void group_sync_upd_work(struct work_struct *work) if (!queue) continue; - syncobj = group->syncobjs->kmap + (queue_idx * sizeof(*syncobj)); + syncobj = group->syncobjs.bo->kmap + (queue_idx * sizeof(*syncobj)); spin_lock(>fence_ctx.lock); list_for_each_entry_safe(job, job_tmp, >fence_ctx.in_flight_jobs, node) { @@ -2764,15 +2793,23 @@ queue_run_job(struct drm_sched_job *sched_job)
Re: [PATCH 1/2] drm/panthor: Enable fdinfo for cycle and time measurements
Hi Liviu, Thanks for your review. Also want to apologise for replying so late. Today I'll be sending a v2 of this patch series after having applied all your suggestions. On 28.03.2024 15:49, Liviu Dudau wrote: > Hi Adrián, > > Appologies for the delay in reviewing this. > > On Tue, Mar 05, 2024 at 09:05:49PM +, Adrián Larumbe wrote: > > These values are sampled by the firmware right before jumping into the UM > > command stream and immediately after returning from it, and then kept > > inside a > > per-job accounting structure. That structure is held inside the group's > > syncobjs > > buffer object, at an offset that depends on the job's queue slot number and > > the > > queue's index within the group. > > I think this commit message is misleadingly short compared to the size of the > changes. If I may, I would like to suggest that you split this commit into two > parts, one introducing the changes in the ringbuf and syncobjs structures and > the other exporting the statistics in the fdinfo. > > > > > Signed-off-by: Adrián Larumbe > > --- > > drivers/gpu/drm/panthor/panthor_devfreq.c | 10 + > > drivers/gpu/drm/panthor/panthor_device.h | 11 ++ > > drivers/gpu/drm/panthor/panthor_drv.c | 31 > > drivers/gpu/drm/panthor/panthor_sched.c | 217 +++--- > > 4 files changed, 241 insertions(+), 28 deletions(-) > > > > diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c > > b/drivers/gpu/drm/panthor/panthor_devfreq.c > > index 7ac4fa290f27..51a7b734edcd 100644 > > --- a/drivers/gpu/drm/panthor/panthor_devfreq.c > > +++ b/drivers/gpu/drm/panthor/panthor_devfreq.c > > @@ -91,6 +91,7 @@ static int panthor_devfreq_get_dev_status(struct device > > *dev, > > spin_lock_irqsave(>lock, irqflags); > > > > panthor_devfreq_update_utilization(pdevfreq); > > + ptdev->current_frequency = status->current_frequency; > > > > status->total_time = ktime_to_ns(ktime_add(pdevfreq->busy_time, > >pdevfreq->idle_time)); > > @@ -130,6 +131,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > > struct panthor_devfreq *pdevfreq; > > struct dev_pm_opp *opp; > > unsigned long cur_freq; > > + unsigned long freq = ULONG_MAX; > > int ret; > > > > pdevfreq = drmm_kzalloc(>base, sizeof(*ptdev->devfreq), > > GFP_KERNEL); > > @@ -204,6 +206,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev) > > > > dev_pm_opp_put(opp); > > > > + /* Find the fastest defined rate */ > > + opp = dev_pm_opp_find_freq_floor(dev, ); > > + if (IS_ERR(opp)) > > + return PTR_ERR(opp); > > + ptdev->fast_rate = freq; > > + > > + dev_pm_opp_put(opp); > > + > > /* > > * Setup default thresholds for the simple_ondemand governor. > > * The values are chosen based on experiments. > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h > > b/drivers/gpu/drm/panthor/panthor_device.h > > index 51c9d61b6796..10e970921ca3 100644 > > --- a/drivers/gpu/drm/panthor/panthor_device.h > > +++ b/drivers/gpu/drm/panthor/panthor_device.h > > @@ -162,6 +162,14 @@ struct panthor_device { > > */ > > u32 *dummy_latest_flush; > > } pm; > > + > > + unsigned long current_frequency; > > + unsigned long fast_rate; > > +}; > > + > > +struct panthor_gpu_usage { > > + u64 time; > > + u64 cycles; > > }; > > > > /** > > @@ -176,6 +184,9 @@ struct panthor_file { > > > > /** @groups: Scheduling group pool attached to this file. */ > > struct panthor_group_pool *groups; > > + > > + /** @stats: cycle and timestamp measures for job execution. */ > > + struct panthor_gpu_usage stats; > > }; > > > > int panthor_device_init(struct panthor_device *ptdev); > > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c > > b/drivers/gpu/drm/panthor/panthor_drv.c > > index ff484506229f..fa06b9e2c6cd 100644 > > --- a/drivers/gpu/drm/panthor/panthor_drv.c > > +++ b/drivers/gpu/drm/panthor/panthor_drv.c > > @@ -3,6 +3,10 @@ > > /* Copyright 2019 Linaro, Ltd., Rob Herring */ > > /* Copyright 2019 Collabora ltd. */ > > > > +#ifdef CONFIG_HAVE_ARM_ARCH_TIMER > > +#include > > +#endif > > + > > #include > > #include > > #include > > @@ -28,6 +32,8 @@ > > #i
[PATCH] drm/panfrost: Fix dma_resv deadlock at drm object pin time
When Panfrost must pin an object that is being prepared a dma-buf attachment for on behalf of another driver, the core drm gem object pinning code already takes a lock on the object's dma reservation. However, Panfrost GEM object's pinning callback would eventually try taking the lock on the same dma reservation when delegating pinning of the object onto the shmem subsystem, which led to a deadlock. This can be shown by enabling CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which throws the following recursive locking situation: weston/3440 is trying to acquire lock: 00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_pin+0x34/0xb8 [drm_shmem_helper] but task is already holding lock: 00e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_pin+0x2c/0x80 [drm] Fix it by assuming the object's reservation had already been locked by the time we reach panfrost_gem_pin. Cc: Thomas Zimmermann Cc: Dmitry Osipenko Cc: Boris Brezillon Cc: Steven Price Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panfrost/panfrost_gem.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index d47b40b82b0b..6c26652d425d 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -192,7 +192,12 @@ static int panfrost_gem_pin(struct drm_gem_object *obj) if (bo->is_heap) return -EINVAL; - return drm_gem_shmem_pin(>base); + /* +* Pinning can only happen in response to a prime attachment request from +* another driver, but that's already being handled by drm_gem_pin +*/ + drm_WARN_ON(obj->dev, obj->import_attach); + return drm_gem_shmem_pin_locked(>base); } static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj) base-commit: 04687bff66b8a4b22748aa7215d3baef0b318e5b -- 2.44.0
Re: [PATCH] drm/panfrost: Show overall GPU usage stats through sysfs knob
On 04.04.2024 11:31, Maíra Canal wrote: > On 4/4/24 11:00, Adrián Larumbe wrote: > > This changeset is heavily inspired by commit 509433d8146c ("drm/v3d: Expose > > the total GPU usage stats on sysfs"). The point is making broader GPU > > occupancy numbers available through the sysfs interface, so that for every > > job slot, its number of processed jobs and total processing time are > > displayed. > > Shouldn't we make this sysfs interface a generic DRM interface? > Something that would be standard for all drivers and that we could > integrate into gputop in the future. I think the best way to generalise this sysfs knob would be to create a DRM class attribute somewhere in drivers/gpu/drm/drm_sysfs.c and then adding a new function to 'struct drm_driver' that would return a structure with the relevant information (execution units and their names, number of processed jobs, etc). What that information would exactly be is up to debate, I guess, since different drivers might be interested in showing different bits of information. Laying that down is important because the sysfs file would become part of the device class API. I might come up with a new RFC patch series that does precisely that, at least for v3d and Panfrost, and maybe other people could pitch in with the sort of things they'd like to see for other drivers? Cheers, Adrian > Best Regards, > - Maíra > > > > > Cc: Boris Brezillon > > Cc: Christopher Healy > > Signed-off-by: Adrián Larumbe > > --- > > drivers/gpu/drm/panfrost/panfrost_device.h | 5 +++ > > drivers/gpu/drm/panfrost/panfrost_drv.c| 49 -- > > drivers/gpu/drm/panfrost/panfrost_job.c| 17 +++- > > drivers/gpu/drm/panfrost/panfrost_job.h| 3 ++ > > 4 files changed, 68 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h > > b/drivers/gpu/drm/panfrost/panfrost_device.h > > index cffcb0ac7c11..1d343351c634 100644 > > --- a/drivers/gpu/drm/panfrost/panfrost_device.h > > +++ b/drivers/gpu/drm/panfrost/panfrost_device.h > > @@ -169,6 +169,11 @@ struct panfrost_engine_usage { > > unsigned long long cycles[NUM_JOB_SLOTS]; > > }; > > +struct panfrost_slot_usage { > > + u64 enabled_ns; > > + u64 jobs_sent; > > +}; > > + > > struct panfrost_file_priv { > > struct panfrost_device *pfdev; > > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c > > b/drivers/gpu/drm/panfrost/panfrost_drv.c > > index ef9f6c0716d5..6afcde66270f 100644 > > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c > > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c > > @@ -8,6 +8,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -524,6 +525,10 @@ static const struct drm_ioctl_desc > > panfrost_drm_driver_ioctls[] = { > > PANFROST_IOCTL(MADVISE, madvise,DRM_RENDER_ALLOW), > > }; > > +static const char * const engine_names[] = { > > + "fragment", "vertex-tiler", "compute-only" > > +}; > > + > > static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev, > > struct panfrost_file_priv *panfrost_priv, > > struct drm_printer *p) > > @@ -543,10 +548,6 @@ static void panfrost_gpu_show_fdinfo(struct > > panfrost_device *pfdev, > > * job spent on the GPU. > > */ > > - static const char * const engine_names[] = { > > - "fragment", "vertex-tiler", "compute-only" > > - }; > > - > > BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS); > > for (i = 0; i < NUM_JOB_SLOTS - 1; i++) { > > @@ -716,8 +717,48 @@ static ssize_t profiling_store(struct device *dev, > > static DEVICE_ATTR_RW(profiling); > > +static ssize_t > > +gpu_stats_show(struct device *dev, struct device_attribute *attr, char > > *buf) > > +{ > > + struct panfrost_device *pfdev = dev_get_drvdata(dev); > > + struct panfrost_slot_usage stats; > > + u64 timestamp = local_clock(); > > + ssize_t len = 0; > > + unsigned int i; > > + > > + BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS); > > + > > + len += sysfs_emit(buf, "queuetimestampjobs > > runtime\n"); > > + len += sysfs_emit_at(buf, len, > > "-\n"); > > + > > + for (i = 0; i < NUM_JOB_SLOTS - 1;
[PATCH] drm/panfrost: Show overall GPU usage stats through sysfs knob
This changeset is heavily inspired by commit 509433d8146c ("drm/v3d: Expose the total GPU usage stats on sysfs"). The point is making broader GPU occupancy numbers available through the sysfs interface, so that for every job slot, its number of processed jobs and total processing time are displayed. Cc: Boris Brezillon Cc: Christopher Healy Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panfrost/panfrost_device.h | 5 +++ drivers/gpu/drm/panfrost/panfrost_drv.c| 49 -- drivers/gpu/drm/panfrost/panfrost_job.c| 17 +++- drivers/gpu/drm/panfrost/panfrost_job.h| 3 ++ 4 files changed, 68 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index cffcb0ac7c11..1d343351c634 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -169,6 +169,11 @@ struct panfrost_engine_usage { unsigned long long cycles[NUM_JOB_SLOTS]; }; +struct panfrost_slot_usage { + u64 enabled_ns; + u64 jobs_sent; +}; + struct panfrost_file_priv { struct panfrost_device *pfdev; diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index ef9f6c0716d5..6afcde66270f 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -524,6 +525,10 @@ static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = { PANFROST_IOCTL(MADVISE, madvise,DRM_RENDER_ALLOW), }; +static const char * const engine_names[] = { + "fragment", "vertex-tiler", "compute-only" +}; + static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev, struct panfrost_file_priv *panfrost_priv, struct drm_printer *p) @@ -543,10 +548,6 @@ static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev, * job spent on the GPU. */ - static const char * const engine_names[] = { - "fragment", "vertex-tiler", "compute-only" - }; - BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS); for (i = 0; i < NUM_JOB_SLOTS - 1; i++) { @@ -716,8 +717,48 @@ static ssize_t profiling_store(struct device *dev, static DEVICE_ATTR_RW(profiling); +static ssize_t +gpu_stats_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct panfrost_device *pfdev = dev_get_drvdata(dev); + struct panfrost_slot_usage stats; + u64 timestamp = local_clock(); + ssize_t len = 0; + unsigned int i; + + BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS); + + len += sysfs_emit(buf, "queuetimestampjobs runtime\n"); + len += sysfs_emit_at(buf, len, "-\n"); + + for (i = 0; i < NUM_JOB_SLOTS - 1; i++) { + + stats = get_slot_stats(pfdev, i); + + /* +* Each line will display the slot name, timestamp, the number +* of jobs handled by that engine and runtime, as shown below: +* +* queuetimestampjobsruntime +* - +* fragment 12252943467507 638 1184747640 +* vertex-tiler 12252943467507 636 121663838 +* +*/ + len += sysfs_emit_at(buf, len, "%-13s%-17llu%-12llu%llu\n", +engine_names[i], +timestamp, +stats.jobs_sent, +stats.enabled_ns); + } + + return len; +} +static DEVICE_ATTR_RO(gpu_stats); + static struct attribute *panfrost_attrs[] = { _attr_profiling.attr, + _attr_gpu_stats.attr, NULL, }; diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index a61ef0af9a4e..4c779e6f4cb0 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -31,6 +31,8 @@ struct panfrost_queue_state { struct drm_gpu_scheduler sched; u64 fence_context; u64 emit_seqno; + + struct panfrost_slot_usage stats; }; struct panfrost_job_slot { @@ -160,15 +162,20 @@ panfrost_dequeue_job(struct panfrost_device *pfdev, int slot) WARN_ON(!job); if (job->is_profiled) { + u64 job_time = ktime_to_ns(ktime_sub(ktime_get(), job->start_time)); + if (job->engine_usage) { - job->
[PATCH] drm/sysfs: Add drm class-wide attribute to get active device clients
Up to this day, all fdinfo-based GPU profilers must traverse the entire /proc directory structure to find open DRM clients with fdinfo file descriptors. This is inefficient and time-consuming. This patch adds a new device class attribute that will install a sysfs file per DRM device, which can be queried by profilers to get a list of PIDs for their open clients. This file isn't human-readable, and it's meant to be queried only by GPU profilers like gputop and nvtop. Cc: Boris Brezillon Cc: Tvrtko Ursulin Cc: Christopher Healy Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/drm_internal.h | 2 +- drivers/gpu/drm/drm_privacy_screen.c | 2 +- drivers/gpu/drm/drm_sysfs.c | 89 ++-- 3 files changed, 74 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 2215baef9a3e..9a399b03d11c 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -145,7 +145,7 @@ bool drm_master_internal_acquire(struct drm_device *dev); void drm_master_internal_release(struct drm_device *dev); /* drm_sysfs.c */ -extern struct class *drm_class; +extern struct class drm_class; int drm_sysfs_init(void); void drm_sysfs_destroy(void); diff --git a/drivers/gpu/drm/drm_privacy_screen.c b/drivers/gpu/drm/drm_privacy_screen.c index 6cc39e30781f..2fbd24ba5818 100644 --- a/drivers/gpu/drm/drm_privacy_screen.c +++ b/drivers/gpu/drm/drm_privacy_screen.c @@ -401,7 +401,7 @@ struct drm_privacy_screen *drm_privacy_screen_register( mutex_init(>lock); BLOCKING_INIT_NOTIFIER_HEAD(>notifier_head); - priv->dev.class = drm_class; + priv->dev.class = _class; priv->dev.type = _privacy_screen_type; priv->dev.parent = parent; priv->dev.release = drm_privacy_screen_device_release; diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c index a953f69a34b6..56ca9e22c720 100644 --- a/drivers/gpu/drm/drm_sysfs.c +++ b/drivers/gpu/drm/drm_sysfs.c @@ -58,8 +58,6 @@ static struct device_type drm_sysfs_device_connector = { .name = "drm_connector", }; -struct class *drm_class; - #ifdef CONFIG_ACPI static bool drm_connector_acpi_bus_match(struct device *dev) { @@ -128,6 +126,62 @@ static const struct component_ops typec_connector_ops = { static CLASS_ATTR_STRING(version, S_IRUGO, "drm 1.1.0 20060810"); +static ssize_t clients_show(struct device *cd, struct device_attribute *attr, char *buf) +{ + struct drm_minor *minor = cd->driver_data; + struct drm_device *ddev = minor->dev; + struct drm_file *priv; + ssize_t offset = 0; + void *pid_buf; + + if (minor->type != DRM_MINOR_RENDER) + return 0; + + pid_buf = kvmalloc(PAGE_SIZE, GFP_KERNEL); + if (!pid_buf) + return 0; + + mutex_lock(>filelist_mutex); + list_for_each_entry_reverse(priv, >filelist, lhead) { + struct pid *pid; + + if (drm_WARN_ON(ddev, (PAGE_SIZE - offset) < sizeof(pid_t))) + break; + + rcu_read_lock(); + pid = rcu_dereference(priv->pid); + (*(pid_t *)(pid_buf + offset)) = pid_vnr(pid); + rcu_read_unlock(); + + offset += sizeof(pid_t); + } + mutex_unlock(>filelist_mutex); + + if (offset < PAGE_SIZE) + (*(pid_t *)(pid_buf + offset)) = 0; + + memcpy(buf, pid_buf, offset); + + kvfree(pid_buf); + + return offset; + +} +static DEVICE_ATTR_RO(clients); + +static struct attribute *drm_device_attrs[] = { + _attr_clients.attr, + NULL, +}; +ATTRIBUTE_GROUPS(drm_device); + +struct class drm_class = { + .name = "drm", + .dev_groups = drm_device_groups, +}; + +static bool drm_class_initialised; + /** * drm_sysfs_init - initialize sysfs helpers * @@ -142,18 +196,19 @@ int drm_sysfs_init(void) { int err; - drm_class = class_create("drm"); - if (IS_ERR(drm_class)) - return PTR_ERR(drm_class); + err = class_register(_class); + if (err) + return err; - err = class_create_file(drm_class, _attr_version.attr); + err = class_create_file(_class, _attr_version.attr); if (err) { - class_destroy(drm_class); - drm_class = NULL; + class_destroy(_class); return err; } - drm_class->devnode = drm_devnode; + drm_class.devnode = drm_devnode; + + drm_class_initialised = true; drm_sysfs_acpi_register(); return 0; @@ -166,12 +221,12 @@ int drm_sysfs_init(void) */ void drm_sysfs_destroy(void) { - if (IS_ERR_OR_NULL(drm_class)) + if (!drm_class_initialised) return; drm_sysfs_acpi_unregister()
[PATCH] drm/panfrost: Only display fdinfo's engine and cycle tags when profiling is on
If job accounting is disabled, then both fdinfo's drm-engine and drm-cycle key values will remain immutable. In that case, it makes more sense not to display them at all to avoid confusing user space profiling tools. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panfrost/panfrost_drv.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index eec250114114..ef9f6c0716d5 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -550,10 +550,12 @@ static void panfrost_gpu_show_fdinfo(struct panfrost_device *pfdev, BUILD_BUG_ON(ARRAY_SIZE(engine_names) != NUM_JOB_SLOTS); for (i = 0; i < NUM_JOB_SLOTS - 1; i++) { - drm_printf(p, "drm-engine-%s:\t%llu ns\n", - engine_names[i], panfrost_priv->engine_usage.elapsed_ns[i]); - drm_printf(p, "drm-cycles-%s:\t%llu\n", - engine_names[i], panfrost_priv->engine_usage.cycles[i]); + if (pfdev->profile_mode) { + drm_printf(p, "drm-engine-%s:\t%llu ns\n", + engine_names[i], panfrost_priv->engine_usage.elapsed_ns[i]); + drm_printf(p, "drm-cycles-%s:\t%llu\n", + engine_names[i], panfrost_priv->engine_usage.cycles[i]); + } drm_printf(p, "drm-maxfreq-%s:\t%lu Hz\n", engine_names[i], pfdev->pfdevfreq.fast_rate); drm_printf(p, "drm-curfreq-%s:\t%lu Hz\n", base-commit: 97252d0a4bfbb07079503d059f7522d305fe0f7a -- 2.43.0
Re: [PATCH v3 1/1] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs
On 11.03.2024 11:02, Boris Brezillon wrote: > On Wed, 6 Mar 2024 08:33:47 + > Tvrtko Ursulin wrote: > > > On 06/03/2024 01:56, Adrián Larumbe wrote: > > > Debugfs isn't always available in production builds that try to squeeze > > > every single byte out of the kernel image, but we still need a way to > > > toggle the timestamp and cycle counter registers so that jobs can be > > > profiled for fdinfo's drm engine and cycle calculations. > > > > > > Drop the debugfs knob and replace it with a sysfs file that accomplishes > > > the same functionality, and document its ABI in a separate file. > > > > > > Signed-off-by: Adrián Larumbe > > > --- > > > .../testing/sysfs-driver-panfrost-profiling | 10 + > > > Documentation/gpu/panfrost.rst| 9 > > > drivers/gpu/drm/panfrost/Makefile | 2 - > > > drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 -- > > > drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 --- > > > drivers/gpu/drm/panfrost/panfrost_device.h| 2 +- > > > drivers/gpu/drm/panfrost/panfrost_drv.c | 41 --- > > > drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- > > > 8 files changed, 57 insertions(+), 44 deletions(-) > > > create mode 100644 > > > Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > > delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c > > > delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h > > > > > > diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > > b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > > new file mode 100644 > > > index ..1d8bb0978920 > > > --- /dev/null > > > +++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > > @@ -0,0 +1,10 @@ > > > +What:/sys/bus/platform/drivers/panfrost/.../profiling > > > +Date:February 2024 > > > +KernelVersion: 6.8.0 > > > +Contact: Adrian Larumbe > > > +Description: > > > + Get/set drm fdinfo's engine and cycles profiling status. > > > + Valid values are: > > > + 0: Don't enable fdinfo job profiling sources. > > > + 1: Enable fdinfo job profiling sources, this enables both the > > > GPU's > > > +timestamp and cycle counter registers. > > > \ No newline at end of file > > > diff --git a/Documentation/gpu/panfrost.rst > > > b/Documentation/gpu/panfrost.rst > > > index b80e41f4b2c5..51ba375fd80d 100644 > > > --- a/Documentation/gpu/panfrost.rst > > > +++ b/Documentation/gpu/panfrost.rst > > > @@ -38,3 +38,12 @@ the currently possible format options: > > > > > > Possible `drm-engine-` key names are: `fragment`, and `vertex-tiler`. > > > `drm-curfreq-` values convey the current operating frequency for that > > > engine. > > > + > > > +Users must bear in mind that engine and cycle sampling are disabled by > > > default, > > > +because of power saving concerns. `fdinfo` users and benchmark > > > applications which > > > +query the fdinfo file must make sure to toggle the job profiling status > > > of the > > > +driver by writing into the appropriate sysfs node:: > > > + > > > +echo > > > > /sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/profiling > > > > A late thought - how it would work to not output the inactive fdinfo > > keys when this knob is not enabled? > > > > Generic userspace like gputop already handles that and wouldn't show the > > stat. Which may be more user friendly than showing stats permanently at > > zero. It may be moot once you add the auto-toggle to gputop (or so) but > > perhaps worth considering. > > I agree with Tvrtko, if the line being printed in fdinfo relies on some > sysfs knob to be valid, we'd rather not print the information in that > case, instead of printing zero. Me too. I'll go first change both gputop and nvtop to make sure they use the new sysfs knob for Panfrost, and then submit a new patch that handles printing of the drm-cycles-* and drm-engine-* stats depending on the profiling knob state.
[PATCH v3 0/1] drm/panfrost: Replace fdinfo's profiling debugfs knob
This is v3 of the patch already discussed in [2] and [1] Changelog: v3: - Replaced manual kobj initialisation with a device attribute - Handle user input with kstrtobool instead of treating it as an uint v2: - Turned the profile mode atomic variable into a boolean - Rewrote the sysfs file's uAPI documentation to make it more generic - Improved the casting of the profiling variable inside the Panfrost device structure [2]https://lore.kernel.org/dri-devel/20240302154845.3223223-2-adrian.laru...@collabora.com/ [1]https://lore.kernel.org/dri-devel/20240221161237.2478193-1-adrian.laru...@collabora.com/ Adrián Larumbe (1): drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs .../testing/sysfs-driver-panfrost-profiling | 10 + Documentation/gpu/panfrost.rst| 9 drivers/gpu/drm/panfrost/Makefile | 2 - drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 -- drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 --- drivers/gpu/drm/panfrost/panfrost_device.h| 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 41 --- drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- 8 files changed, 57 insertions(+), 44 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-driver-panfrost-profiling delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h base-commit: e635b7eb7062b464bbd9795308b1a80eac0b01f5 -- 2.43.0
[PATCH v3 1/1] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs
Debugfs isn't always available in production builds that try to squeeze every single byte out of the kernel image, but we still need a way to toggle the timestamp and cycle counter registers so that jobs can be profiled for fdinfo's drm engine and cycle calculations. Drop the debugfs knob and replace it with a sysfs file that accomplishes the same functionality, and document its ABI in a separate file. Signed-off-by: Adrián Larumbe --- .../testing/sysfs-driver-panfrost-profiling | 10 + Documentation/gpu/panfrost.rst| 9 drivers/gpu/drm/panfrost/Makefile | 2 - drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 -- drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 --- drivers/gpu/drm/panfrost/panfrost_device.h| 2 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 41 --- drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- 8 files changed, 57 insertions(+), 44 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-driver-panfrost-profiling delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling new file mode 100644 index ..1d8bb0978920 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling @@ -0,0 +1,10 @@ +What: /sys/bus/platform/drivers/panfrost/.../profiling +Date: February 2024 +KernelVersion: 6.8.0 +Contact: Adrian Larumbe +Description: + Get/set drm fdinfo's engine and cycles profiling status. + Valid values are: + 0: Don't enable fdinfo job profiling sources. + 1: Enable fdinfo job profiling sources, this enables both the GPU's + timestamp and cycle counter registers. \ No newline at end of file diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst index b80e41f4b2c5..51ba375fd80d 100644 --- a/Documentation/gpu/panfrost.rst +++ b/Documentation/gpu/panfrost.rst @@ -38,3 +38,12 @@ the currently possible format options: Possible `drm-engine-` key names are: `fragment`, and `vertex-tiler`. `drm-curfreq-` values convey the current operating frequency for that engine. + +Users must bear in mind that engine and cycle sampling are disabled by default, +because of power saving concerns. `fdinfo` users and benchmark applications which +query the fdinfo file must make sure to toggle the job profiling status of the +driver by writing into the appropriate sysfs node:: + +echo > /sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/profiling + +Where `N` is either `0` or `1`, depending on the desired enablement status. diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile index 2c01c1e7523e..7da2b3f02ed9 100644 --- a/drivers/gpu/drm/panfrost/Makefile +++ b/drivers/gpu/drm/panfrost/Makefile @@ -12,6 +12,4 @@ panfrost-y := \ panfrost_perfcnt.o \ panfrost_dump.o -panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o - obj-$(CONFIG_DRM_PANFROST) += panfrost.o diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c b/drivers/gpu/drm/panfrost/panfrost_debugfs.c deleted file mode 100644 index 72d4286a6bf7.. --- a/drivers/gpu/drm/panfrost/panfrost_debugfs.c +++ /dev/null @@ -1,21 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* Copyright 2023 Collabora ltd. */ -/* Copyright 2023 Amazon.com, Inc. or its affiliates. */ - -#include -#include -#include -#include -#include - -#include "panfrost_device.h" -#include "panfrost_gpu.h" -#include "panfrost_debugfs.h" - -void panfrost_debugfs_init(struct drm_minor *minor) -{ - struct drm_device *dev = minor->dev; - struct panfrost_device *pfdev = platform_get_drvdata(to_platform_device(dev->dev)); - - debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, >profile_mode); -} diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h b/drivers/gpu/drm/panfrost/panfrost_debugfs.h deleted file mode 100644 index c5af5f35877f.. --- a/drivers/gpu/drm/panfrost/panfrost_debugfs.h +++ /dev/null @@ -1,14 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Copyright 2023 Collabora ltd. - * Copyright 2023 Amazon.com, Inc. or its affiliates. - */ - -#ifndef PANFROST_DEBUGFS_H -#define PANFROST_DEBUGFS_H - -#ifdef CONFIG_DEBUG_FS -void panfrost_debugfs_init(struct drm_minor *minor); -#endif - -#endif /* PANFROST_DEBUGFS_H */ diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 62f7e3527385..cffcb0ac7c11 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -130,7 +130,7 @@ struct panfrost_device { struct list_head scheduled_jobs; struct panfrost_perfcn
[PATCH 0/2] Support fdinfo runtime and memory stats on Panthor
This patch series enables userspace utilities like gputop and nvtop to query a render context's fdinfo file and figure out rates of engine and memory utilisation. Adrián Larumbe (2): drm/panthor: Enable fdinfo for cycle and time measurements drm/panthor: Enable fdinfo for memory stats drivers/gpu/drm/panthor/panthor_devfreq.c | 10 + drivers/gpu/drm/panthor/panthor_device.h | 11 ++ drivers/gpu/drm/panthor/panthor_drv.c | 32 drivers/gpu/drm/panthor/panthor_gem.c | 12 ++ drivers/gpu/drm/panthor/panthor_sched.c | 217 +++--- 5 files changed, 254 insertions(+), 28 deletions(-) base-commit: e635b7eb7062b464bbd9795308b1a80eac0b01f5 -- 2.43.0
[PATCH 1/2] drm/panthor: Enable fdinfo for cycle and time measurements
These values are sampled by the firmware right before jumping into the UM command stream and immediately after returning from it, and then kept inside a per-job accounting structure. That structure is held inside the group's syncobjs buffer object, at an offset that depends on the job's queue slot number and the queue's index within the group. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_devfreq.c | 10 + drivers/gpu/drm/panthor/panthor_device.h | 11 ++ drivers/gpu/drm/panthor/panthor_drv.c | 31 drivers/gpu/drm/panthor/panthor_sched.c | 217 +++--- 4 files changed, 241 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c b/drivers/gpu/drm/panthor/panthor_devfreq.c index 7ac4fa290f27..51a7b734edcd 100644 --- a/drivers/gpu/drm/panthor/panthor_devfreq.c +++ b/drivers/gpu/drm/panthor/panthor_devfreq.c @@ -91,6 +91,7 @@ static int panthor_devfreq_get_dev_status(struct device *dev, spin_lock_irqsave(>lock, irqflags); panthor_devfreq_update_utilization(pdevfreq); + ptdev->current_frequency = status->current_frequency; status->total_time = ktime_to_ns(ktime_add(pdevfreq->busy_time, pdevfreq->idle_time)); @@ -130,6 +131,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev) struct panthor_devfreq *pdevfreq; struct dev_pm_opp *opp; unsigned long cur_freq; + unsigned long freq = ULONG_MAX; int ret; pdevfreq = drmm_kzalloc(>base, sizeof(*ptdev->devfreq), GFP_KERNEL); @@ -204,6 +206,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev) dev_pm_opp_put(opp); + /* Find the fastest defined rate */ + opp = dev_pm_opp_find_freq_floor(dev, ); + if (IS_ERR(opp)) + return PTR_ERR(opp); + ptdev->fast_rate = freq; + + dev_pm_opp_put(opp); + /* * Setup default thresholds for the simple_ondemand governor. * The values are chosen based on experiments. diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index 51c9d61b6796..10e970921ca3 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -162,6 +162,14 @@ struct panthor_device { */ u32 *dummy_latest_flush; } pm; + + unsigned long current_frequency; + unsigned long fast_rate; +}; + +struct panthor_gpu_usage { + u64 time; + u64 cycles; }; /** @@ -176,6 +184,9 @@ struct panthor_file { /** @groups: Scheduling group pool attached to this file. */ struct panthor_group_pool *groups; + + /** @stats: cycle and timestamp measures for job execution. */ + struct panthor_gpu_usage stats; }; int panthor_device_init(struct panthor_device *ptdev); diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index ff484506229f..fa06b9e2c6cd 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -3,6 +3,10 @@ /* Copyright 2019 Linaro, Ltd., Rob Herring */ /* Copyright 2019 Collabora ltd. */ +#ifdef CONFIG_HAVE_ARM_ARCH_TIMER +#include +#endif + #include #include #include @@ -28,6 +32,8 @@ #include "panthor_regs.h" #include "panthor_sched.h" +#define NS_PER_SEC 10ULL + /** * DOC: user <-> kernel object copy helpers. */ @@ -1336,6 +1342,29 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) return ret; } +static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev, + struct panthor_file *pfile, + struct drm_printer *p) +{ +#ifdef CONFIG_HAVE_ARM_ARCH_TIMER + drm_printf(p, "drm-engine-panthor:\t%llu ns\n", + DIV_ROUND_UP_ULL((pfile->stats.time * NS_PER_SEC), + arch_timer_get_cntfrq())); +#endif + drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles); + drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate); + drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", ptdev->current_frequency); +} + +static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file) +{ + struct drm_device *dev = file->minor->dev; + struct panthor_device *ptdev = container_of(dev, struct panthor_device, base); + + panthor_gpu_show_fdinfo(ptdev, file->driver_priv, p); + +} + static const struct file_operations panthor_drm_driver_fops = { .open = drm_open, .release = drm_release, @@ -1345,6 +1374,7 @@ static const struct file_operations panthor_drm_driver_fops = { .read = drm_read, .llseek = noop_llseek, .mmap = panthor_mma
[PATCH 2/2] drm/panthor: Enable fdinfo for memory stats
When vm-binding an already-created BO, the entirety of its virtual size is then backed by system memory, so its RSS is always the same as its virtual size. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/panthor_drv.c | 1 + drivers/gpu/drm/panthor/panthor_gem.c | 12 2 files changed, 13 insertions(+) diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index fa06b9e2c6cd..a5398e161f75 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -1363,6 +1363,7 @@ static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file) panthor_gpu_show_fdinfo(ptdev, file->driver_priv, p); + drm_show_memory_stats(p, file); } static const struct file_operations panthor_drm_driver_fops = { diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c index d6483266d0c2..845724e3fd93 100644 --- a/drivers/gpu/drm/panthor/panthor_gem.c +++ b/drivers/gpu/drm/panthor/panthor_gem.c @@ -143,6 +143,17 @@ panthor_gem_prime_export(struct drm_gem_object *obj, int flags) return drm_gem_prime_export(obj, flags); } +static enum drm_gem_object_status panthor_gem_status(struct drm_gem_object *obj) +{ + struct panthor_gem_object *bo = to_panthor_bo(obj); + enum drm_gem_object_status res = 0; + + if (bo->base.pages) + res |= DRM_GEM_OBJECT_RESIDENT; + + return res; +} + static const struct drm_gem_object_funcs panthor_gem_funcs = { .free = panthor_gem_free_object, .print_info = drm_gem_shmem_object_print_info, @@ -152,6 +163,7 @@ static const struct drm_gem_object_funcs panthor_gem_funcs = { .vmap = drm_gem_shmem_object_vmap, .vunmap = drm_gem_shmem_object_vunmap, .mmap = panthor_gem_mmap, + .status = panthor_gem_status, .export = panthor_gem_prime_export, .vm_ops = _gem_shmem_vm_ops, }; -- 2.43.0
[PATCH] drm/panthor: Add support for performance counters
This brings in support for Panthor's HW performance counters and querying them from UM through a specific ioctl(). The code is inspired by existing functionality for the Panfrost driver, with some noteworthy differences: - Sample size is now reported by the firmware rather than having to reckon it by hand - Counter samples are chained in a ring buffer that can be accessed concurrently, but only from threads within a single context (this is because of a HW limitation). - List of enabled counters must be explicitly told from UM - Rather than allocating the BO that will contain the perfcounter values in the render context's address space, the samples ring buffer is mapped onto the MCU's VM. - If more than one thread within the same context tries to dump a sample, then the kernel will copy the same frame to every single thread that was able to join the dump queue right before the FW finished processing the sample request. - UM must provide a BO handle for retrieval of perfcnt values rather than passing a user virtual address. The reason multicontext access to the driver's perfcnt ioctl interface isn't tolerated is because toggling a different set of counters than the current one implies a counter reset, which also messes up with the ring buffer's extraction and insertion pointers. This is an unfortunate hardware limitation. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panthor/Makefile | 3 +- drivers/gpu/drm/panthor/panthor_device.c | 6 + drivers/gpu/drm/panthor/panthor_device.h | 6 + drivers/gpu/drm/panthor/panthor_drv.c | 61 +++ drivers/gpu/drm/panthor/panthor_fw.c | 27 ++ drivers/gpu/drm/panthor/panthor_fw.h | 12 + drivers/gpu/drm/panthor/panthor_perfcnt.c | 551 ++ drivers/gpu/drm/panthor/panthor_perfcnt.h | 31 ++ drivers/gpu/drm/panthor/panthor_sched.c | 1 + include/uapi/drm/panthor_drm.h| 72 +++ 10 files changed, 769 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/panthor/panthor_perfcnt.c create mode 100644 drivers/gpu/drm/panthor/panthor_perfcnt.h diff --git a/drivers/gpu/drm/panthor/Makefile b/drivers/gpu/drm/panthor/Makefile index 15294719b09c..7f841fd053d4 100644 --- a/drivers/gpu/drm/panthor/Makefile +++ b/drivers/gpu/drm/panthor/Makefile @@ -9,6 +9,7 @@ panthor-y := \ panthor_gpu.o \ panthor_heap.o \ panthor_mmu.o \ - panthor_sched.o + panthor_sched.o \ + panthor_perfcnt.o obj-$(CONFIG_DRM_PANTHOR) += panthor.o diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c index bfe8da4a6e4c..5dfd82891063 100644 --- a/drivers/gpu/drm/panthor/panthor_device.c +++ b/drivers/gpu/drm/panthor/panthor_device.c @@ -20,6 +20,7 @@ #include "panthor_mmu.h" #include "panthor_regs.h" #include "panthor_sched.h" +#include "panthor_perfcnt.h" static int panthor_clk_init(struct panthor_device *ptdev) { @@ -78,6 +79,7 @@ void panthor_device_unplug(struct panthor_device *ptdev) /* Now, try to cleanly shutdown the GPU before the device resources * get reclaimed. */ + panthor_perfcnt_unplug(ptdev); panthor_sched_unplug(ptdev); panthor_fw_unplug(ptdev); panthor_mmu_unplug(ptdev); @@ -233,6 +235,10 @@ int panthor_device_init(struct panthor_device *ptdev) if (ret) goto err_unplug_fw; + ret = panthor_perfcnt_init(ptdev); + if (ret) + goto err_rpm_put; + /* ~3 frames */ pm_runtime_set_autosuspend_delay(ptdev->base.dev, 50); pm_runtime_use_autosuspend(ptdev->base.dev); diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index 51c9d61b6796..adf0bd29deb0 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -100,6 +100,9 @@ struct panthor_device { /** @csif_info: Command stream interface information. */ struct drm_panthor_csif_info csif_info; + /** @perfcnt_info: Performance counters interface information. */ + struct drm_panthor_perfcnt_info perfcnt_info; + /** @gpu: GPU management data. */ struct panthor_gpu *gpu; @@ -127,6 +130,9 @@ struct panthor_device { struct completion done; } unplug; + /** @perfcnt: Device performance counters data. */ + struct panthor_perfcnt *perfcnt; + /** @reset: Reset related fields. */ struct { /** @wq: Ordered worqueud used to schedule reset operations. */ diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index ff484506229f..6cb9ea0aa553 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -27,6 +27,7 @@ #include "panthor_mmu.h" #include "panthor_regs.h" #include "panthor_sched.h&quo
[PATCH v2 1/1] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs
Debugfs isn't always available in production builds that try to squeeze every single byte out of the kernel image, but we still need a way to toggle the timestamp and cycle counter registers so that jobs can be profiled for fdinfo's drm engine and cycle calculations. Drop the debugfs knob and replace it with a sysfs file that accomplishes the same functionality, and document its ABI in a separate file. Signed-off-by: Adrián Larumbe --- .../testing/sysfs-driver-panfrost-profiling | 10 +++ Documentation/gpu/panfrost.rst| 9 +++ drivers/gpu/drm/panfrost/Makefile | 5 +- drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 -- drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 drivers/gpu/drm/panfrost/panfrost_device.h| 5 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 14 ++-- drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- drivers/gpu/drm/panfrost/panfrost_sysfs.c | 70 +++ drivers/gpu/drm/panfrost/panfrost_sysfs.h | 15 10 files changed, 120 insertions(+), 45 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-driver-panfrost-profiling delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.h diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling new file mode 100644 index ..889527b71b9d --- /dev/null +++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling @@ -0,0 +1,10 @@ +What: /sys/bus/.../drivers/panfrost/.../drm/../profiling/status +Date: February 2024 +KernelVersion: 6.8.0 +Contact: Adrian Larumbe +Description: + Get/set drm fdinfo's engine and cycles profiling status. + Valid values are: + 0: Don't enable fdinfo job profiling sources. + 1: Enable fdinfo job profiling sources, this enables both the GPU's + timestamp and cycle counter registers. \ No newline at end of file diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst index b80e41f4b2c5..be4ac282ef63 100644 --- a/Documentation/gpu/panfrost.rst +++ b/Documentation/gpu/panfrost.rst @@ -38,3 +38,12 @@ the currently possible format options: Possible `drm-engine-` key names are: `fragment`, and `vertex-tiler`. `drm-curfreq-` values convey the current operating frequency for that engine. + +Users must bear in mind that engine and cycle sampling are disabled by default, +because of power saving concerns. `fdinfo` users and benchmark applications which +query the fdinfo file must make sure to toggle the job profiling status of the +driver by writing into the appropriate sysfs node:: + +echo > /sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/drm/card1/profiling + +Where `N` is either `0` or `1`, depending on the desired enablement status. diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile index 2c01c1e7523e..6e718595d8a6 100644 --- a/drivers/gpu/drm/panfrost/Makefile +++ b/drivers/gpu/drm/panfrost/Makefile @@ -10,8 +10,7 @@ panfrost-y := \ panfrost_job.o \ panfrost_mmu.o \ panfrost_perfcnt.o \ - panfrost_dump.o - -panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o + panfrost_dump.o \ + panfrost_sysfs.o obj-$(CONFIG_DRM_PANFROST) += panfrost.o diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c b/drivers/gpu/drm/panfrost/panfrost_debugfs.c deleted file mode 100644 index 72d4286a6bf7.. --- a/drivers/gpu/drm/panfrost/panfrost_debugfs.c +++ /dev/null @@ -1,21 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* Copyright 2023 Collabora ltd. */ -/* Copyright 2023 Amazon.com, Inc. or its affiliates. */ - -#include -#include -#include -#include -#include - -#include "panfrost_device.h" -#include "panfrost_gpu.h" -#include "panfrost_debugfs.h" - -void panfrost_debugfs_init(struct drm_minor *minor) -{ - struct drm_device *dev = minor->dev; - struct panfrost_device *pfdev = platform_get_drvdata(to_platform_device(dev->dev)); - - debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, >profile_mode); -} diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h b/drivers/gpu/drm/panfrost/panfrost_debugfs.h deleted file mode 100644 index c5af5f35877f.. --- a/drivers/gpu/drm/panfrost/panfrost_debugfs.h +++ /dev/null @@ -1,14 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Copyright 2023 Collabora ltd. - * Copyright 2023 Amazon.com, Inc. or its affiliates. - */ - -#ifndef PANFROST_DEBUGFS_H -#define PANFROST_DEBUGFS_H - -#ifdef CONFIG_DEBUG_FS -void panfrost_debugfs_init(struct drm_minor *minor); -#endif - -#endif /* PANFROST_DEBUGFS_H */ diff --git a/dr
[PATCH v2 0/1] drm/panfrost: Replace fdinfo's profiling debugfs knob
This is v2 of the patch already discussed in [1] Changelog: - Turned the profile mode atomic variable into a boolean - Rewrote the sysfs file's uAPI documentation to make it more generic - Improved the casting of the profiling variable inside the Panfrost device structure [1]https://lore.kernel.org/dri-devel/20240221161237.2478193-1-adrian.laru...@collabora.com/ Adrián Larumbe (1): drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs .../testing/sysfs-driver-panfrost-profiling | 10 +++ Documentation/gpu/panfrost.rst| 9 +++ drivers/gpu/drm/panfrost/Makefile | 5 +- drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 -- drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 drivers/gpu/drm/panfrost/panfrost_device.h| 5 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 14 ++-- drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- drivers/gpu/drm/panfrost/panfrost_sysfs.c | 70 +++ drivers/gpu/drm/panfrost/panfrost_sysfs.h | 15 10 files changed, 120 insertions(+), 45 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-driver-panfrost-profiling delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.h base-commit: 216c1282dde38ca87ebdf1ccacee5a0682901574 -- 2.43.0
Re: [PATCH] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs
Hi Boris, On 26.02.2024 09:51, Boris Brezillon wrote: > On Wed, 21 Feb 2024 16:12:32 + > Adrián Larumbe wrote: > > > Debugfs isn't always available in production builds that try to squeeze > > every single byte out of the kernel image, but we still need a way to > > toggle the timestamp and cycle counter registers so that jobs can be > > profiled for fdinfo's drm engine and cycle calculations. > > > > Drop the debugfs knob and replace it with a sysfs file that accomplishes > > the same functionality, and document its ABI in a separate file. > > > > Signed-off-by: Adrián Larumbe > > --- > > .../testing/sysfs-driver-panfrost-profiling | 10 +++ > > Documentation/gpu/panfrost.rst| 9 +++ > > drivers/gpu/drm/panfrost/Makefile | 5 +- > > drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 -- > > drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 > > drivers/gpu/drm/panfrost/panfrost_device.h| 5 +- > > drivers/gpu/drm/panfrost/panfrost_drv.c | 14 ++-- > > drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- > > drivers/gpu/drm/panfrost/panfrost_sysfs.c | 74 +++ > > drivers/gpu/drm/panfrost/panfrost_sysfs.h | 15 > > 10 files changed, 124 insertions(+), 45 deletions(-) > > create mode 100644 > > Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c > > delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h > > create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.c > > create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.h > > > > diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > new file mode 100644 > > index ..ce54069714f3 > > --- /dev/null > > +++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > @@ -0,0 +1,10 @@ > > +What: > > /sys/bus/.../drivers/panfrost/.../drm/../profiling/status > > +Date: February 2024 > > +KernelVersion: 6.8.0 > > +Contact: Adrian Larumbe > > +Description: > > +Get/set drm fdinfo's engine and cycles profiling status. > > +Valid values are: > > + 0: Disable fdinfo job profiling sources. This disables both the > > GPU's > > +timestamp and cycle counter registers. > > + 1: Enable the above. > > diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst > > index b80e41f4b2c5..be4ac282ef63 100644 > > --- a/Documentation/gpu/panfrost.rst > > +++ b/Documentation/gpu/panfrost.rst > > @@ -38,3 +38,12 @@ the currently possible format options: > > > > Possible `drm-engine-` key names are: `fragment`, and `vertex-tiler`. > > `drm-curfreq-` values convey the current operating frequency for that > > engine. > > + > > +Users must bear in mind that engine and cycle sampling are disabled by > > default, > > +because of power saving concerns. `fdinfo` users and benchmark > > applications which > > +query the fdinfo file must make sure to toggle the job profiling status of > > the > > +driver by writing into the appropriate sysfs node:: > > + > > +echo > > > /sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/drm/card1/profiling > > + > > +Where `N` is either `0` or `1`, depending on the desired enablement status. > > diff --git a/drivers/gpu/drm/panfrost/Makefile > > b/drivers/gpu/drm/panfrost/Makefile > > index 2c01c1e7523e..6e718595d8a6 100644 > > --- a/drivers/gpu/drm/panfrost/Makefile > > +++ b/drivers/gpu/drm/panfrost/Makefile > > @@ -10,8 +10,7 @@ panfrost-y := \ > > panfrost_job.o \ > > panfrost_mmu.o \ > > panfrost_perfcnt.o \ > > - panfrost_dump.o > > - > > -panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o > > + panfrost_dump.o \ > > + panfrost_sysfs.o > > > > obj-$(CONFIG_DRM_PANFROST) += panfrost.o > > diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c > > b/drivers/gpu/drm/panfrost/panfrost_debugfs.c > > deleted file mode 100644 > > index 72d4286a6bf7.. > > --- a/drivers/gpu/drm/panfrost/panfrost_debugfs.c > > +++ /dev/null > > @@ -1,21 +0,0 @@ > > -// SPDX-License-Identifier: GPL-2.0 > > -/* Copyright 2023 Collabora ltd. */ > > -/* Copyright 2023 Amazon.com, Inc. or its affiliates. */ > > - > > -#include > > -#include
Re: [PATCH] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs
Hi Steve, On 21.02.2024 16:52, Steven Price wrote: > On 21/02/2024 16:12, Adrián Larumbe wrote: > > Debugfs isn't always available in production builds that try to squeeze > > every single byte out of the kernel image, but we still need a way to > > toggle the timestamp and cycle counter registers so that jobs can be > > profiled for fdinfo's drm engine and cycle calculations. > > > > Drop the debugfs knob and replace it with a sysfs file that accomplishes > > the same functionality, and document its ABI in a separate file. > > > > Signed-off-by: Adrián Larumbe > > --- > > .../testing/sysfs-driver-panfrost-profiling | 10 +++ > > Documentation/gpu/panfrost.rst| 9 +++ > > drivers/gpu/drm/panfrost/Makefile | 5 +- > > drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 -- > > drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 > > drivers/gpu/drm/panfrost/panfrost_device.h| 5 +- > > drivers/gpu/drm/panfrost/panfrost_drv.c | 14 ++-- > > drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- > > drivers/gpu/drm/panfrost/panfrost_sysfs.c | 74 +++ > > drivers/gpu/drm/panfrost/panfrost_sysfs.h | 15 > > 10 files changed, 124 insertions(+), 45 deletions(-) > > create mode 100644 > > Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c > > delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h > > create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.c > > create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.h > > > > diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > new file mode 100644 > > index ..ce54069714f3 > > --- /dev/null > > +++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling > > @@ -0,0 +1,10 @@ > > +What: > > /sys/bus/.../drivers/panfrost/.../drm/../profiling/status > > +Date: February 2024 > > +KernelVersion: 6.8.0 > > +Contact: Adrian Larumbe > > +Description: > > +Get/set drm fdinfo's engine and cycles profiling status. > > +Valid values are: > > + 0: Disable fdinfo job profiling sources. This disables both the > > GPU's > > +timestamp and cycle counter registers. > > + 1: Enable the above. > > Minor point, but if we're going to eventually come up with a generic way > of doing this, then we're going to have to think about backwards > compatibility for this sysfs file. I would expect in this new world '0' > would mean "default behaviour; off unless the new-fangled thing enables > profiling" and '1' means "force on". > > In which case perhaps wording like the below would be clearer: > > 0: Don't enable fdinfo job profiling sources. > 1: Enable fdinfo job profiling sources, this enables both the GPU's >timestamp and cycle counter registers. > > Or am I being too picky over the wording ;) I'm alright with this kind of wording, to keep things as generic as possible. Initially I thought just mentioning 0 and 1 as potential toggle values would be enough, and then every driver could describe their own profiling/status sysfs knob in similar terms, depending on what profiling resouces they act upon. > One other small issue below... > > > diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst > > index b80e41f4b2c5..be4ac282ef63 100644 > > --- a/Documentation/gpu/panfrost.rst > > +++ b/Documentation/gpu/panfrost.rst > > @@ -38,3 +38,12 @@ the currently possible format options: > > > > Possible `drm-engine-` key names are: `fragment`, and `vertex-tiler`. > > `drm-curfreq-` values convey the current operating frequency for that > > engine. > > + > > +Users must bear in mind that engine and cycle sampling are disabled by > > default, > > +because of power saving concerns. `fdinfo` users and benchmark > > applications which > > +query the fdinfo file must make sure to toggle the job profiling status of > > the > > +driver by writing into the appropriate sysfs node:: > > + > > +echo > > > /sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/drm/card1/profiling > > + > > +Where `N` is either `0` or `1`, depending on the desired enablement status. > > diff --git a/drivers/gpu/drm/panfrost/Makefile > > b/drivers/gpu/drm/panfrost/Makefile > > index 2c01c1e7523e..6e718595d8a6 10064
[PATCH] drm/panfrost: Replace fdinfo's profiling debugfs knob with sysfs
Debugfs isn't always available in production builds that try to squeeze every single byte out of the kernel image, but we still need a way to toggle the timestamp and cycle counter registers so that jobs can be profiled for fdinfo's drm engine and cycle calculations. Drop the debugfs knob and replace it with a sysfs file that accomplishes the same functionality, and document its ABI in a separate file. Signed-off-by: Adrián Larumbe --- .../testing/sysfs-driver-panfrost-profiling | 10 +++ Documentation/gpu/panfrost.rst| 9 +++ drivers/gpu/drm/panfrost/Makefile | 5 +- drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 -- drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 drivers/gpu/drm/panfrost/panfrost_device.h| 5 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 14 ++-- drivers/gpu/drm/panfrost/panfrost_job.c | 2 +- drivers/gpu/drm/panfrost/panfrost_sysfs.c | 74 +++ drivers/gpu/drm/panfrost/panfrost_sysfs.h | 15 10 files changed, 124 insertions(+), 45 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-driver-panfrost-profiling delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_sysfs.h diff --git a/Documentation/ABI/testing/sysfs-driver-panfrost-profiling b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling new file mode 100644 index ..ce54069714f3 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-driver-panfrost-profiling @@ -0,0 +1,10 @@ +What: /sys/bus/.../drivers/panfrost/.../drm/../profiling/status +Date: February 2024 +KernelVersion: 6.8.0 +Contact: Adrian Larumbe +Description: +Get/set drm fdinfo's engine and cycles profiling status. +Valid values are: + 0: Disable fdinfo job profiling sources. This disables both the GPU's +timestamp and cycle counter registers. + 1: Enable the above. diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst index b80e41f4b2c5..be4ac282ef63 100644 --- a/Documentation/gpu/panfrost.rst +++ b/Documentation/gpu/panfrost.rst @@ -38,3 +38,12 @@ the currently possible format options: Possible `drm-engine-` key names are: `fragment`, and `vertex-tiler`. `drm-curfreq-` values convey the current operating frequency for that engine. + +Users must bear in mind that engine and cycle sampling are disabled by default, +because of power saving concerns. `fdinfo` users and benchmark applications which +query the fdinfo file must make sure to toggle the job profiling status of the +driver by writing into the appropriate sysfs node:: + +echo > /sys/bus/platform/drivers/panfrost/[a-f0-9]*.gpu/drm/card1/profiling + +Where `N` is either `0` or `1`, depending on the desired enablement status. diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile index 2c01c1e7523e..6e718595d8a6 100644 --- a/drivers/gpu/drm/panfrost/Makefile +++ b/drivers/gpu/drm/panfrost/Makefile @@ -10,8 +10,7 @@ panfrost-y := \ panfrost_job.o \ panfrost_mmu.o \ panfrost_perfcnt.o \ - panfrost_dump.o - -panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o + panfrost_dump.o \ + panfrost_sysfs.o obj-$(CONFIG_DRM_PANFROST) += panfrost.o diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c b/drivers/gpu/drm/panfrost/panfrost_debugfs.c deleted file mode 100644 index 72d4286a6bf7.. --- a/drivers/gpu/drm/panfrost/panfrost_debugfs.c +++ /dev/null @@ -1,21 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* Copyright 2023 Collabora ltd. */ -/* Copyright 2023 Amazon.com, Inc. or its affiliates. */ - -#include -#include -#include -#include -#include - -#include "panfrost_device.h" -#include "panfrost_gpu.h" -#include "panfrost_debugfs.h" - -void panfrost_debugfs_init(struct drm_minor *minor) -{ - struct drm_device *dev = minor->dev; - struct panfrost_device *pfdev = platform_get_drvdata(to_platform_device(dev->dev)); - - debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, >profile_mode); -} diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h b/drivers/gpu/drm/panfrost/panfrost_debugfs.h deleted file mode 100644 index c5af5f35877f.. --- a/drivers/gpu/drm/panfrost/panfrost_debugfs.h +++ /dev/null @@ -1,14 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Copyright 2023 Collabora ltd. - * Copyright 2023 Amazon.com, Inc. or its affiliates. - */ - -#ifndef PANFROST_DEBUGFS_H -#define PANFROST_DEBUGFS_H - -#ifdef CONFIG_DEBUG_FS -void panfrost_debugfs_init(struct drm_minor *minor); -#endif - -#endif /* PANFROST_DEBUGFS_H */ diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/driver
Re: [PATCH 0/1] Always record job cycle and timestamp information
> On 21.02.2024 14:34, Tvrtko Ursulin wrote: > > On 21/02/2024 09:40, Adrián Larumbe wrote: > > Hi, > > > > I just wanted to make sure we're on the same page on this matter. So in > > Panfrost, and I guess in almost every other single driver out there, HW perf > > counters and their uapi interface are orthogonal to fdinfo's reporting on > > drm > > engine utilisation. > > > > At the moment it seems like HW perfcounters and the way they're exposed to > > UM > > are very idiosincratic and any attempt to unify their interface into a > > common > > set of ioctl's sounds like a gargantuan task I wouldn't like to be faced > > with. > > I share the same feeling on this sub-topic. > > > As for fdinfo, I guess there's more room for coming up with common helpers > > that > > could handle the toggling of HW support for drm engine calculations, but > > I'd at > > least have to see how things are being done in let's say, Freedreno or > > Intel. > > For Intel we don't need this ability, well at least for pre-GuC platforms. > Stat collection is super cheap and permanently enabled there. > > But let me copy Umesh because something at the back of my mind is telling me > that perhaps there was something expensive about collecting these stats with > the GuC backend? If so maybe a toggle would be beneficial there. > > > Right now there's a pressing need to get rid of the debugfs knob for > > fdinfo's > > drm engine profiling sources in Panfrost, after which I could perhaps draw > > up an > > RFC for how to generalise this onto other drivers. > > There is a knob currently meaning fdinfo does not work by default? If that is > so, I would have at least expected someone had submitted a patch for gputop to > handle this toggle. It being kind of a common reference implementation I don't > think it is great if it does not work out of the box. It does sound like I forgot to document this knob at the time I submited fdinfo support for Panforst. I'll make a point of mentioning it in a new patch where I drop debugfs support and enable toggling from sysfs instead. > The toggle as an idea sounds a bit annoying, but if there is no other > realistic way maybe it is not too bad. As long as it is documented in the > drm-usage-stats.rst, doesn't live in debugfs, and has some common plumbing > implemented both on the kernel side and for the aforementioned gputop / > igt_drm_fdinfo / igt_drm_clients. Where and how exactly TBD. As soon as the new patch is merged, I'll go and reflect the driver uAPI changes in all three of these. > Regards, > > Tvrtko > Cheers, Adrian > > On 16.02.2024 17:43, Tvrtko Ursulin wrote: > > > > > > On 16/02/2024 16:57, Daniel Vetter wrote: > > > > On Wed, Feb 14, 2024 at 01:52:05PM +, Steven Price wrote: > > > > > Hi Adrián, > > > > > > > > > > On 14/02/2024 12:14, Adrián Larumbe wrote: > > > > > > A driver user expressed interest in being able to access engine > > > > > > usage stats > > > > > > through fdinfo when debugfs is not built into their kernel. In the > > > > > > current > > > > > > implementation, this wasn't possible, because it was assumed even > > > > > > for > > > > > > inflight jobs enabling the cycle counter and timestamp registers > > > > > > would > > > > > > incur in additional power consumption, so both were kept disabled > > > > > > until > > > > > > toggled through debugfs. > > > > > > > > > > > > A second read of the TRM made me think otherwise, but this is > > > > > > something > > > > > > that would be best clarified by someone from ARM's side. > > > > > > > > > > I'm afraid I can't give a definitive answer. This will probably vary > > > > > depending on implementation. The command register enables/disables > > > > > "propagation" of the cycle/timestamp values. This propagation will > > > > > cost > > > > > some power (gates are getting toggled) but whether that power is > > > > > completely in the noise of the GPU as a whole I can't say. > > > > > > > > > > The out-of-tree kbase driver only enables the counters for jobs > > > > > explicitly marked (BASE_JD_REQ_PERMON) or due to an explicit > > > > > connection > > > > > from a profiler. > > > > > > > > > > I
Re: [PATCH 0/1] Always record job cycle and timestamp information
Hi, I just wanted to make sure we're on the same page on this matter. So in Panfrost, and I guess in almost every other single driver out there, HW perf counters and their uapi interface are orthogonal to fdinfo's reporting on drm engine utilisation. At the moment it seems like HW perfcounters and the way they're exposed to UM are very idiosincratic and any attempt to unify their interface into a common set of ioctl's sounds like a gargantuan task I wouldn't like to be faced with. As for fdinfo, I guess there's more room for coming up with common helpers that could handle the toggling of HW support for drm engine calculations, but I'd at least have to see how things are being done in let's say, Freedreno or Intel. Right now there's a pressing need to get rid of the debugfs knob for fdinfo's drm engine profiling sources in Panfrost, after which I could perhaps draw up an RFC for how to generalise this onto other drivers. Adrian On 16.02.2024 17:43, Tvrtko Ursulin wrote: > > On 16/02/2024 16:57, Daniel Vetter wrote: > > On Wed, Feb 14, 2024 at 01:52:05PM +, Steven Price wrote: > > > Hi Adrián, > > > > > > On 14/02/2024 12:14, Adrián Larumbe wrote: > > > > A driver user expressed interest in being able to access engine usage > > > > stats > > > > through fdinfo when debugfs is not built into their kernel. In the > > > > current > > > > implementation, this wasn't possible, because it was assumed even for > > > > inflight jobs enabling the cycle counter and timestamp registers would > > > > incur in additional power consumption, so both were kept disabled until > > > > toggled through debugfs. > > > > > > > > A second read of the TRM made me think otherwise, but this is something > > > > that would be best clarified by someone from ARM's side. > > > > > > I'm afraid I can't give a definitive answer. This will probably vary > > > depending on implementation. The command register enables/disables > > > "propagation" of the cycle/timestamp values. This propagation will cost > > > some power (gates are getting toggled) but whether that power is > > > completely in the noise of the GPU as a whole I can't say. > > > > > > The out-of-tree kbase driver only enables the counters for jobs > > > explicitly marked (BASE_JD_REQ_PERMON) or due to an explicit connection > > > from a profiler. > > > > > > I'd be happier moving the debugfs file to sysfs rather than assuming > > > that the power consumption is small enough for all platforms. > > > > > > Ideally we'd have some sort of kernel interface for a profiler to inform > > > the kernel what it is interested in, but I can't immediately see how to > > > make that useful across different drivers. kbase's profiling support is > > > great with our profiling tools, but there's a very strong connection > > > between the two. > > > > Yeah I'm not sure whether a magic (worse probably per-driver massively > > different) file in sysfs is needed to enable gpu perf monitoring stats in > > fdinfo. > > > > I get that we do have a bit a gap because the linux perf pmu stuff is > > global, and you want per-process, and there's kinda no per-process support > > for perf stats for devices. But that's probably the direction we want to > > go, not so much fdinfo. At least for hardware performance counters and > > things like that. > > > > Iirc the i915 pmu support had some integration for per-process support, > > you might want to chat with Tvrtko for kernel side and Lionel for more > > userspace side. At least if I'm not making a complete mess and my memory > > is vaguely related to reality. Adding them both. > > Yeah there are two separate things, i915 PMU and i915 Perf/OA. > > If my memory serves me right I indeed did have a per-process support for i915 > PMU implemented as an RFC (or at least a branch somewhere) some years back. > IIRC it only exposed the per engine GPU utilisation and did not find it very > useful versus the complexity. (I think it at least required maintaining a map > of drm clients per task.) > > Our more useful profiling is using a custom Perf/OA interface (Observation > Architecture) which is possibly similar to kbase mentioned above. Why it is a > custom interface is explained in a large comment on top of i915_perf.c. Not > sure if all of them still hold but on the overall perf does not sound like the > right fit for detailed GPU profiling. > > Also PMU drivers are very challenging to get the implementation right, since > locking model an
[PATCH 0/1] Always record job cycle and timestamp information
A driver user expressed interest in being able to access engine usage stats through fdinfo when debugfs is not built into their kernel. In the current implementation, this wasn't possible, because it was assumed even for inflight jobs enabling the cycle counter and timestamp registers would incur in additional power consumption, so both were kept disabled until toggled through debugfs. A second read of the TRM made me think otherwise, but this is something that would be best clarified by someone from ARM's side. Adrián Larumbe (1): drm/panfrost: Always record job cycle and timestamp information drivers/gpu/drm/panfrost/Makefile | 2 -- drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 -- drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 drivers/gpu/drm/panfrost/panfrost_device.h | 1 - drivers/gpu/drm/panfrost/panfrost_drv.c | 5 - drivers/gpu/drm/panfrost/panfrost_job.c | 24 - drivers/gpu/drm/panfrost/panfrost_job.h | 1 - 7 files changed, 9 insertions(+), 59 deletions(-) delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h base-commit: 6b1f93ea345947c94bf3a7a6e668a2acfd310918 -- 2.43.0
[PATCH 1/1] drm/panfrost: Always record job cycle and timestamp information
Some users of Panfrost expressed interest in being able to gather fdinfo stats for running jobs, on production builds with no built-in debugfs support. Sysfs was first considered, but eventually it was realised timestamp and cycle counting don't incur in additional power consumption when the GPU is running and there are inflight jobs, so there's no reason to let user space toggle profiling. Remove the profiling debugfs knob altogether so that cycle and timestamp counting is always enabled for inflight jobs. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panfrost/Makefile | 2 -- drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 -- drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 drivers/gpu/drm/panfrost/panfrost_device.h | 1 - drivers/gpu/drm/panfrost/panfrost_drv.c | 5 - drivers/gpu/drm/panfrost/panfrost_job.c | 24 - drivers/gpu/drm/panfrost/panfrost_job.h | 1 - 7 files changed, 9 insertions(+), 59 deletions(-) delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c delete mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile index 2c01c1e7523e..7da2b3f02ed9 100644 --- a/drivers/gpu/drm/panfrost/Makefile +++ b/drivers/gpu/drm/panfrost/Makefile @@ -12,6 +12,4 @@ panfrost-y := \ panfrost_perfcnt.o \ panfrost_dump.o -panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o - obj-$(CONFIG_DRM_PANFROST) += panfrost.o diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c b/drivers/gpu/drm/panfrost/panfrost_debugfs.c deleted file mode 100644 index 72d4286a6bf7.. --- a/drivers/gpu/drm/panfrost/panfrost_debugfs.c +++ /dev/null @@ -1,21 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* Copyright 2023 Collabora ltd. */ -/* Copyright 2023 Amazon.com, Inc. or its affiliates. */ - -#include -#include -#include -#include -#include - -#include "panfrost_device.h" -#include "panfrost_gpu.h" -#include "panfrost_debugfs.h" - -void panfrost_debugfs_init(struct drm_minor *minor) -{ - struct drm_device *dev = minor->dev; - struct panfrost_device *pfdev = platform_get_drvdata(to_platform_device(dev->dev)); - - debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, >profile_mode); -} diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h b/drivers/gpu/drm/panfrost/panfrost_debugfs.h deleted file mode 100644 index c5af5f35877f.. --- a/drivers/gpu/drm/panfrost/panfrost_debugfs.h +++ /dev/null @@ -1,14 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * Copyright 2023 Collabora ltd. - * Copyright 2023 Amazon.com, Inc. or its affiliates. - */ - -#ifndef PANFROST_DEBUGFS_H -#define PANFROST_DEBUGFS_H - -#ifdef CONFIG_DEBUG_FS -void panfrost_debugfs_init(struct drm_minor *minor); -#endif - -#endif /* PANFROST_DEBUGFS_H */ diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h index 62f7e3527385..cd6bbcb2bea4 100644 --- a/drivers/gpu/drm/panfrost/panfrost_device.h +++ b/drivers/gpu/drm/panfrost/panfrost_device.h @@ -130,7 +130,6 @@ struct panfrost_device { struct list_head scheduled_jobs; struct panfrost_perfcnt *perfcnt; - atomic_t profile_mode; struct mutex sched_lock; diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index a926d71e8131..e31fd4d62bbe 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -20,7 +20,6 @@ #include "panfrost_job.h" #include "panfrost_gpu.h" #include "panfrost_perfcnt.h" -#include "panfrost_debugfs.h" static bool unstable_ioctls; module_param_unsafe(unstable_ioctls, bool, 0600); @@ -600,10 +599,6 @@ static const struct drm_driver panfrost_drm_driver = { .gem_create_object = panfrost_gem_create_object, .gem_prime_import_sg_table = panfrost_gem_prime_import_sg_table, - -#ifdef CONFIG_DEBUG_FS - .debugfs_init = panfrost_debugfs_init, -#endif }; static int panfrost_probe(struct platform_device *pdev) diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 0c2dbf6ef2a5..745b16a77edd 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -159,13 +159,11 @@ panfrost_dequeue_job(struct panfrost_device *pfdev, int slot) struct panfrost_job *job = pfdev->jobs[slot][0]; WARN_ON(!job); - if (job->is_profiled) { - if (job->engine_usage) { - job->engine_usage->elapsed_ns[slot] += - ktime_to_ns(ktime_sub(ktime_get(), job->start_time)); - job->engine_usage->cycles[slot] += - panfrost_cy
[PATCH 2/2] drm/panfrost: Fix incorrect updating of current device frequency
It was noticed when setting the Panfrost's DVFS device to the performant governor, GPU frequency as reported by fdinfo had dropped to 0 permamently. There are two separate issues causing this behaviour: - Not initialising the device's current_frequency variable to its original value during device probe(). - Updating said variable in Panfrost devfreq's get_dev_status() rather than after the new OPP's frequency had been retrieved in target(), which meant the old frequency would be assigned instead. Signed-off-by: Adrián Larumbe Fixes: f11b0417eec2 ("drm/panfrost: Add fdinfo support GPU load metrics") --- drivers/gpu/drm/panfrost/panfrost_devfreq.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c index f59c82ea8870..2d30da38c2c3 100644 --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c @@ -29,14 +29,20 @@ static void panfrost_devfreq_update_utilization(struct panfrost_devfreq *pfdevfr static int panfrost_devfreq_target(struct device *dev, unsigned long *freq, u32 flags) { + struct panfrost_device *ptdev = dev_get_drvdata(dev); struct dev_pm_opp *opp; + int err; opp = devfreq_recommended_opp(dev, freq, flags); if (IS_ERR(opp)) return PTR_ERR(opp); dev_pm_opp_put(opp); - return dev_pm_opp_set_rate(dev, *freq); + err = dev_pm_opp_set_rate(dev, *freq); + if (!err) + ptdev->pfdevfreq.current_frequency = *freq; + + return err; } static void panfrost_devfreq_reset(struct panfrost_devfreq *pfdevfreq) @@ -58,7 +64,6 @@ static int panfrost_devfreq_get_dev_status(struct device *dev, spin_lock_irqsave(>lock, irqflags); panfrost_devfreq_update_utilization(pfdevfreq); - pfdevfreq->current_frequency = status->current_frequency; status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time, pfdevfreq->idle_time)); @@ -164,6 +169,14 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev) panfrost_devfreq_profile.initial_freq = cur_freq; + /* +* We could wait until panfrost_devfreq_target() to set this value, but +* since the simple_ondemand governor works asynchronously, there's a +* chance by the time someone opens the device's fdinfo file, current +* frequency hasn't been updated yet, so let's just do an early set. +*/ + pfdevfreq->current_frequency = cur_freq; + /* * Set the recommend OPP this will enable and configure the regulator * if any and will avoid a switch off by regulator_late_cleanup() -- 2.42.0
[PATCH 0/2] Panfrost devfreq and GEM status fixes
During recent development of the Mali CSF GPU Panthor driver, a user noticed that GPU frequency values as reported by fdinfo were incorrect. This was traced back to incorrect handling of frequency value updates. The same problem was seen in Panfrost. Also one should consider GEM objects from a dma-buf import as being resident in system memory, so that this can be reflected on fdinfo. Adrián Larumbe (2): drm/panfrost: Consider dma-buf imported objects as resident drm/panfrost: Fix incorrect updating of current device frequency drivers/gpu/drm/panfrost/panfrost_devfreq.c | 17 +++-- drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +- 2 files changed, 16 insertions(+), 3 deletions(-) base-commit: 38f922a563aac3148ac73e73689805917f034cb5 -- 2.42.0
[PATCH 1/2] drm/panfrost: Consider dma-buf imported objects as resident
A GEM object constructed from a dma-buf imported sgtable should be regarded as being memory resident, because the dma-buf API mandates backing storage to be allocated when attachment succeeds. Signed-off-by: Adrián Larumbe Fixes: 9ccdac7aa822 ("drm/panfrost: Add fdinfo support for memory stats") Reported-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 0cf64456e29a..d47b40b82b0b 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -200,7 +200,7 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj struct panfrost_gem_object *bo = to_panfrost_bo(obj); enum drm_gem_object_status res = 0; - if (bo->base.pages) + if (bo->base.base.import_attach || bo->base.pages) res |= DRM_GEM_OBJECT_RESIDENT; if (bo->base.madv == PANFROST_MADV_DONTNEED) -- 2.42.0
Re: [PATCH] drm/panfrost: Remove incorrect IS_ERR() check
On 20.10.2023 11:44, Steven Price wrote: > sg_page_iter_page() doesn't return an error code, so the IS_ERR() check > is wrong and the error path will never be executed. This also allows > simplifying the code to remove the local variable 'page'. > > CC: Adrián Larumbe > Reported-by: Dan Carpenter > Closes: > https://lore.kernel.org/r/376713ff-9a4f-4ea3-b097-fb5efb685d95@moroto.mountain > Signed-off-by: Steven Price Reviewed-by: Adrián Larumbe Tested-by: Adrián Larumbe > --- > drivers/gpu/drm/panfrost/panfrost_dump.c | 12 ++-- > 1 file changed, 2 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/panfrost/panfrost_dump.c > b/drivers/gpu/drm/panfrost/panfrost_dump.c > index e7942ac449c6..47751302f1bc 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_dump.c > +++ b/drivers/gpu/drm/panfrost/panfrost_dump.c > @@ -220,16 +220,8 @@ void panfrost_core_dump(struct panfrost_job *job) > > iter.hdr->bomap.data[0] = bomap - bomap_start; > > - for_each_sgtable_page(bo->base.sgt, _iter, 0) { > - struct page *page = sg_page_iter_page(_iter); > - > - if (!IS_ERR(page)) { > - *bomap++ = page_to_phys(page); > - } else { > - dev_err(pfdev->dev, "Panfrost Dump: wrong > page\n"); > - *bomap++ = 0; > - } > - } > + for_each_sgtable_page(bo->base.sgt, _iter, 0) > + *bomap++ = page_to_phys(sg_page_iter_page(_iter)); > > iter.hdr->bomap.iova = mapping->mmnode.start << PAGE_SHIFT; > > -- > 2.34.1
[PATCH] Documentation/gpu: fix Panfrost documentation build warnings
Fix issues revealed by `make htmldocs` after adding Panfrost DRM documentation file. Signed-off-by: Adrián Larumbe Fixes: d124dac2089c ("drm/panfrost: Add fdinfo support GPU load metrics") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202310030917.txzlpoeq-...@intel.com --- Documentation/gpu/drivers.rst | 1 + Documentation/gpu/panfrost.rst | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/Documentation/gpu/drivers.rst b/Documentation/gpu/drivers.rst index 3a52f48215a3..45a12e552091 100644 --- a/Documentation/gpu/drivers.rst +++ b/Documentation/gpu/drivers.rst @@ -18,6 +18,7 @@ GPU Driver Documentation xen-front afbc komeda-kms + panfrost .. only:: subproject and html diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst index ecc48ba5ac11..a07f6821e994 100644 --- a/Documentation/gpu/panfrost.rst +++ b/Documentation/gpu/panfrost.rst @@ -5,7 +5,7 @@ .. _panfrost-usage-stats: Panfrost DRM client usage stats implementation -== +== The drm/Panfrost driver implements the DRM client usage stats specification as documented in :ref:`drm-client-usage-stats`. -- 2.42.0
[PATCH v8 3/5] drm/panfrost: Add fdinfo support for memory stats
A new DRM GEM object function is added so that drm_show_memory_stats can provide more accurate memory usage numbers. Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked after locking the driver's shrinker mutex, but drm_show_memory_stats takes over the drm file's object handle database spinlock, so there's potential for a race condition here. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: AngeloGioacchino Del Regno --- drivers/gpu/drm/panfrost/panfrost_drv.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++ 2 files changed, 17 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 97e5bc4a82c8..b834777b409b 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -568,6 +568,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file) struct panfrost_device *pfdev = dev->dev_private; panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p); + + drm_show_memory_stats(p, file); } static const struct file_operations panfrost_drm_driver_fops = { diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 3c812fbd126f..de238b71b321 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -195,6 +195,20 @@ static int panfrost_gem_pin(struct drm_gem_object *obj) return drm_gem_shmem_pin(>base); } +static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj) +{ + struct panfrost_gem_object *bo = to_panfrost_bo(obj); + enum drm_gem_object_status res = 0; + + if (bo->base.pages) + res |= DRM_GEM_OBJECT_RESIDENT; + + if (bo->base.madv == PANFROST_MADV_DONTNEED) + res |= DRM_GEM_OBJECT_PURGEABLE; + + return res; +} + static const struct drm_gem_object_funcs panfrost_gem_funcs = { .free = panfrost_gem_free_object, .open = panfrost_gem_open, @@ -206,6 +220,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = { .vmap = drm_gem_shmem_object_vmap, .vunmap = drm_gem_shmem_object_vunmap, .mmap = drm_gem_shmem_object_mmap, + .status = panfrost_gem_status, .vm_ops = _gem_shmem_vm_ops, }; -- 2.42.0
[PATCH v8 5/5] drm/panfrost: Implement generic DRM object RSS reporting function
BO's RSS is updated every time new pages are allocated on demand and mapped for the object at GPU page fault's IRQ handler, but only for heap buffers. The reason this is unnecessary for non-heap buffers is that they are mapped onto the GPU's VA space and backed by physical memory in their entirety at BO creation time. This calculation is unnecessary for imported PRIME objects, since heap buffers cannot be exported by our driver, and the actual BO RSS size is the one reported in its attached dmabuf structure. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: AngeloGioacchino Del Regno --- drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++ drivers/gpu/drm/panfrost/panfrost_gem.h | 5 + drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + 3 files changed, 21 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index de238b71b321..0cf64456e29a 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -209,6 +209,20 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj return res; } +static size_t panfrost_gem_rss(struct drm_gem_object *obj) +{ + struct panfrost_gem_object *bo = to_panfrost_bo(obj); + + if (bo->is_heap) { + return bo->heap_rss_size; + } else if (bo->base.pages) { + WARN_ON(bo->heap_rss_size); + return bo->base.base.size; + } + + return 0; +} + static const struct drm_gem_object_funcs panfrost_gem_funcs = { .free = panfrost_gem_free_object, .open = panfrost_gem_open, @@ -221,6 +235,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = { .vunmap = drm_gem_shmem_object_vunmap, .mmap = drm_gem_shmem_object_mmap, .status = panfrost_gem_status, + .rss = panfrost_gem_rss, .vm_ops = _gem_shmem_vm_ops, }; diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h index ad2877eeeccd..13c0a8149c3a 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.h +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h @@ -36,6 +36,11 @@ struct panfrost_gem_object { */ atomic_t gpu_usecount; + /* +* Object chunk size currently mapped onto physical memory +*/ + size_t heap_rss_size; + bool noexec :1; bool is_heap:1; }; diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index d54d4e7b2195..846dd697c410 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -522,6 +522,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt); bomapping->active = true; + bo->heap_rss_size += SZ_2M; dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr); -- 2.42.0
[PATCH v8 4/5] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
Some BO's might be mapped onto physical memory chunkwise and on demand, like Panfrost's tiler heap. In this case, even though the drm_gem_shmem_object page array might already be allocated, only a very small fraction of the BO is currently backed by system memory, but drm_show_memory_stats will then proceed to add its entire virtual size to the file's total resident size regardless. This led to very unrealistic RSS sizes being reckoned for Panfrost, where said tiler heap buffer is initially allocated with a virtual size of 128 MiB, but only a small part of it will eventually be backed by system memory after successive GPU page faults. Provide a new DRM object generic function that would allow drivers to return a more accurate RSS and purgeable sizes for their BOs. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: AngeloGioacchino Del Regno --- drivers/gpu/drm/drm_file.c | 8 +--- include/drm/drm_gem.h | 9 + 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 883d83bc0e3d..9a1bd8d0d785 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -930,6 +930,8 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) spin_lock(>table_lock); idr_for_each_entry (>object_idr, obj, id) { enum drm_gem_object_status s = 0; + size_t add_size = (obj->funcs && obj->funcs->rss) ? + obj->funcs->rss(obj) : obj->size; if (obj->funcs && obj->funcs->status) { s = obj->funcs->status(obj); @@ -944,7 +946,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } if (s & DRM_GEM_OBJECT_RESIDENT) { - status.resident += obj->size; + status.resident += add_size; } else { /* If already purged or not yet backed by pages, don't * count it as purgeable: @@ -953,14 +955,14 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) { - status.active += obj->size; + status.active += add_size; /* If still active, don't count as purgeable: */ s &= ~DRM_GEM_OBJECT_PURGEABLE; } if (s & DRM_GEM_OBJECT_PURGEABLE) - status.purgeable += obj->size; + status.purgeable += add_size; } spin_unlock(>table_lock); diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index bc9f6aa2f3fe..16364487fde9 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -208,6 +208,15 @@ struct drm_gem_object_funcs { */ enum drm_gem_object_status (*status)(struct drm_gem_object *obj); + /** +* @rss: +* +* Return resident size of the object in physical memory. +* +* Called by drm_show_memory_stats(). +*/ + size_t (*rss)(struct drm_gem_object *obj); + /** * @vm_ops: * -- 2.42.0
[PATCH v8 1/5] drm/panfrost: Add cycle count GPU register definitions
These GPU registers will be used when programming the cycle counter, which we need for providing accurate fdinfo drm-cycles values to user space. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: AngeloGioacchino Del Regno --- drivers/gpu/drm/panfrost/panfrost_regs.h | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h index 919f44ac853d..55ec807550b3 100644 --- a/drivers/gpu/drm/panfrost/panfrost_regs.h +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h @@ -46,6 +46,8 @@ #define GPU_CMD_SOFT_RESET 0x01 #define GPU_CMD_PERFCNT_CLEAR0x03 #define GPU_CMD_PERFCNT_SAMPLE 0x04 +#define GPU_CMD_CYCLE_COUNT_START0x05 +#define GPU_CMD_CYCLE_COUNT_STOP 0x06 #define GPU_CMD_CLEAN_CACHES 0x07 #define GPU_CMD_CLEAN_INV_CACHES 0x08 #define GPU_STATUS 0x34 @@ -73,6 +75,9 @@ #define GPU_PRFCNT_TILER_EN0x74 #define GPU_PRFCNT_MMU_L2_EN 0x7c +#define GPU_CYCLE_COUNT_LO 0x90 +#define GPU_CYCLE_COUNT_HI 0x94 + #define GPU_THREAD_MAX_THREADS 0x0A0 /* (RO) Maximum number of threads per core */ #define GPU_THREAD_MAX_WORKGROUP_SIZE 0x0A4 /* (RO) Maximum workgroup size */ #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8 /* (RO) Maximum threads waiting at a barrier */ -- 2.42.0
[PATCH v8 2/5] drm/panfrost: Add fdinfo support GPU load metrics
The drm-stats fdinfo tags made available to user space are drm-engine, drm-cycles, drm-max-freq and drm-curfreq, one per job slot. This deviates from standard practice in other DRM drivers, where a single set of key:value pairs is provided for the whole render engine. However, Panfrost has separate queues for fragment and vertex/tiler jobs, so a decision was made to calculate bus cycles and workload times separately. Maximum operating frequency is calculated at devfreq initialisation time. Current frequency is made available to user space because nvtop uses it when performing engine usage calculations. It is important to bear in mind that both GPU cycle and kernel time numbers provided are at best rough estimations, and always reported in excess from the actual figure because of two reasons: - Excess time because of the delay between the end of a job processing, the subsequent job IRQ and the actual time of the sample. - Time spent in the engine queue waiting for the GPU to pick up the next job. To avoid race conditions during enablement/disabling, a reference counting mechanism was introduced, and a job flag that tells us whether a given job increased the refcount. This is necessary, because user space can toggle cycle counting through a debugfs file, and a given job might have been in flight by the time cycle counting was disabled. The main goal of the debugfs cycle counter knob is letting tools like nvtop or IGT's gputop switch it at any time, to avoid power waste in case no engine usage measuring is necessary. Also add a documentation file explaining the possible values for fdinfo's engine keystrings and Panfrost-specific drm-curfreq- pairs. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: AngeloGioacchino Del Regno --- Documentation/gpu/drm-usage-stats.rst | 1 + Documentation/gpu/panfrost.rst | 38 ++ drivers/gpu/drm/panfrost/Makefile | 2 + drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ drivers/gpu/drm/panfrost/panfrost_device.c | 2 + drivers/gpu/drm/panfrost/panfrost_device.h | 13 + drivers/gpu/drm/panfrost/panfrost_drv.c | 58 - drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 4 ++ drivers/gpu/drm/panfrost/panfrost_job.c | 24 + drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++ 14 files changed, 233 insertions(+), 1 deletion(-) create mode 100644 Documentation/gpu/panfrost.rst create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index fe35a291ff3e..8d963cd7c1b7 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -169,3 +169,4 @@ Driver specific implementations --- :ref:`i915-usage-stats` +:ref:`panfrost-usage-stats` diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst new file mode 100644 index ..ecc48ba5ac11 --- /dev/null +++ b/Documentation/gpu/panfrost.rst @@ -0,0 +1,38 @@ +=== + drm/Panfrost Mali Driver +=== + +.. _panfrost-usage-stats: + +Panfrost DRM client usage stats implementation +== + +The drm/Panfrost driver implements the DRM client usage stats specification as +documented in :ref:`drm-client-usage-stats`. + +Example of the output showing the implemented key value pairs and entirety of +the currently possible format options: + +:: + pos:0 + flags: 0242 + mnt_id: 27 + ino:531 + drm-driver: panfrost + drm-client-id: 14 + drm-engine-fragment:1846584880 ns + drm-cycles-fragment:1424359409 + drm-maxfreq-fragment: 79987 Hz + drm-curfreq-fragment: 79987 Hz + drm-engine-vertex-tiler:71932239 ns + drm-cycles-vertex-tiler:52617357 + drm-maxfreq-vertex-tiler: 79987 Hz + drm-curfreq-vertex-tiler: 79987 Hz + drm-total-memory: 290 MiB + drm-shared-memory: 0 MiB + drm-active-memory: 226 MiB + drm-resident-memory:36496 KiB + drm-purgeable-memory: 128 KiB + +Possible `drm-engine-` key names are: `fragment`, and `vertex-tiler`. +`drm-curfreq-` values convey the current operating frequency for that engine. diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile index 7da2b3f02ed9..2c01c1e7523e 100644 --- a/drivers/gpu/drm/panfrost/Makefile +++ b/drivers/gpu/drm/panfrost/Makefile @@ -12,4 +12,6 @@ panfrost-y
[PATCH v8 0/5] Add fdinfo support to Panfrost
This patch series adds fdinfo support to the Panfrost DRM driver. It will display a series of key:value pairs under /proc/pid/fdinfo/fd for render processes that open the Panfrost DRM file. The pairs contain basic drm gpu engine and memory region information that can either be cat by a privileged user or accessed with IGT's gputop utility. Changelog: v1: https://lore.kernel.org/lkml/bb52b872-e41b-3894-285e-b52cfc849...@arm.com/T/ v2: https://lore.kernel.org/lkml/20230901084457.5bc1a...@collabora.com/T/ - Changed the way gpu cycles and engine time are calculated, using GPU registers and taking into account potential resets. - Split render engine values into fragment and vertex/tiler ones. - Added more fine-grained calculation of RSS size for BO's. - Implemente selection of drm-memory region size units. - Removed locking of shrinker's mutex in GEM obj status function. v3: https://lore.kernel.org/lkml/20230905184533.959171-1-adrian.laru...@collabora.com/ - Changed fdinfo engine names to something more descriptive.; - Mentioned GPU cycle counts aren't an exact measure. - Handled the case when job->priv might be NULL. - Handled 32 bit overflow of cycle register. - Kept fdinfo drm memory stats size unit display within 10k times the previous multiplier for more accurate BO size numbers. - Removed special handling of Prime imported BO RSS. - Use rss_size only for heap objects. - Use bo->base.madv instead of specific purgeable flag. - Fixed kernel test robot warnings. v4: https://lore.kernel.org/lkml/20230912084044.955864-1-adrian.laru...@collabora.com/ - Move cycle counter get and put to panfrost_job_hw_submit and panfrost_job_handle_{err,done} for more accuracy. - Make sure cycle counter refs are released in reset path - Drop the model param for toggling cycle counting and do leave it down to the debugfs file. - Don't disable cycle counter when togglint debugfs file, let refcounting logic handle it instead. - Remove fdinfo data nested structure definion and 'names' field - When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume granuality of 2MiB for every successful mapping. - drm-file picks an fdinfo memory object size unit that doesn't lose precision. v5: https://lore.kernel.org/lkml/20230914223928.2374933-1-adrian.laru...@collabora.com/ - Removed explicit initialisation of atomic variable for profiling mode, as it's allocated with kzalloc. - Pass engine utilisation structure to jobs rather than the file context, to avoid future misusage of the latter. - Remove double reading of cycle counter register and ktime in job deqeueue function, as the scheduler will make sure these values are read over in case of requeuing. - Moved putting of cycle counting refcnt into panfrost job dequeue. function to avoid repetition. v6: https://lore.kernel.org/lkml/c73ad42b-a8db-23c2-86c7-1a2939dba...@linux.intel.com/T/ - Fix wrong swapped-round engine time and cycle values in fdinfo drm print statements. v7: https://lore.kernel.org/lkml/20230927213133.1651169-6-adrian.laru...@collabora.com/T/ - Make sure an object's actual RSS size is added to the overall fdinfo's purgeable and active size tally when it's both resident and purgeable or active. - Create a drm/panfrost.rst documentation file with meaning of fdinfo strings. - BUILD_BUG_ON checking the engine name array size for fdinfo. - Added copyright notices for Amazon in Panfrost's new debugfs files. - Discarded fdinfo memory stats unit size selection patch. v8: - Style improvements and addressing nitpicks. Adrián Larumbe (5): drm/panfrost: Add cycle count GPU register definitions drm/panfrost: Add fdinfo support GPU load metrics drm/panfrost: Add fdinfo support for memory stats drm/drm_file: Add DRM obj's RSS reporting function for fdinfo drm/panfrost: Implement generic DRM object RSS reporting function Documentation/gpu/drm-usage-stats.rst | 1 + Documentation/gpu/panfrost.rst | 38 + drivers/gpu/drm/drm_file.c | 8 +-- drivers/gpu/drm/panfrost/Makefile | 2 + drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ drivers/gpu/drm/panfrost/panfrost_device.c | 2 + drivers/gpu/drm/panfrost/panfrost_device.h | 13 + drivers/gpu/drm/panfrost/panfrost_drv.c | 60 - drivers/gpu/drm/panfrost/panfrost_gem.c | 30 +++ drivers/gpu/drm/panfrost/panfrost_gem.h | 5 ++ drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 4 ++ drivers/gpu/drm/panfrost/panfrost_job.c | 24 + drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++ drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + drivers/gpu/drm/panfrost/panfrost_regs.h
[PATCH v7 2/5] drm/panfrost: Add fdinfo support GPU load metrics
The drm-stats fdinfo tags made available to user space are drm-engine, drm-cycles, drm-max-freq and drm-curfreq, one per job slot. This deviates from standard practice in other DRM drivers, where a single set of key:value pairs is provided for the whole render engine. However, Panfrost has separate queues for fragment and vertex/tiler jobs, so a decision was made to calculate bus cycles and workload times separately. Maximum operating frequency is calculated at devfreq initialisation time. Current frequency is made available to user space because nvtop uses it when performing engine usage calculations. It is important to bear in mind that both GPU cycle and kernel time numbers provided are at best rough estimations, and always reported in excess from the actual figure because of two reasons: - Excess time because of the delay between the end of a job processing, the subsequent job IRQ and the actual time of the sample. - Time spent in the engine queue waiting for the GPU to pick up the next job. To avoid race conditions during enablement/disabling, a reference counting mechanism was introduced, and a job flag that tells us whether a given job increased the refcount. This is necessary, because user space can toggle cycle counting through a debugfs file, and a given job might have been in flight by the time cycle counting was disabled. The main goal of the debugfs cycle counter knob is letting tools like nvtop or IGT's gputop switch it at any time, to avoid power waste in case no engine usage measuring is necessary. Add also documentation file explaining the possible values for fdinfo's engine keystrings and Panfrost-specific drm-curfreq- pairs. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- Documentation/gpu/drm-usage-stats.rst | 1 + Documentation/gpu/panfrost.rst | 38 ++ drivers/gpu/drm/panfrost/Makefile | 2 + drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ drivers/gpu/drm/panfrost/panfrost_device.c | 2 + drivers/gpu/drm/panfrost/panfrost_device.h | 13 + drivers/gpu/drm/panfrost/panfrost_drv.c | 58 - drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 4 ++ drivers/gpu/drm/panfrost/panfrost_job.c | 24 + drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++ 14 files changed, 233 insertions(+), 1 deletion(-) create mode 100644 Documentation/gpu/panfrost.rst create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h diff --git a/Documentation/gpu/drm-usage-stats.rst b/Documentation/gpu/drm-usage-stats.rst index fe35a291ff3e..8d963cd7c1b7 100644 --- a/Documentation/gpu/drm-usage-stats.rst +++ b/Documentation/gpu/drm-usage-stats.rst @@ -169,3 +169,4 @@ Driver specific implementations --- :ref:`i915-usage-stats` +:ref:`panfrost-usage-stats` diff --git a/Documentation/gpu/panfrost.rst b/Documentation/gpu/panfrost.rst new file mode 100644 index ..ecc48ba5ac11 --- /dev/null +++ b/Documentation/gpu/panfrost.rst @@ -0,0 +1,38 @@ +=== + drm/Panfrost Mali Driver +=== + +.. _panfrost-usage-stats: + +Panfrost DRM client usage stats implementation +== + +The drm/Panfrost driver implements the DRM client usage stats specification as +documented in :ref:`drm-client-usage-stats`. + +Example of the output showing the implemented key value pairs and entirety of +the currently possible format options: + +:: + pos:0 + flags: 0242 + mnt_id: 27 + ino:531 + drm-driver: panfrost + drm-client-id: 14 + drm-engine-fragment:1846584880 ns + drm-cycles-fragment:1424359409 + drm-maxfreq-fragment: 79987 Hz + drm-curfreq-fragment: 79987 Hz + drm-engine-vertex-tiler:71932239 ns + drm-cycles-vertex-tiler:52617357 + drm-maxfreq-vertex-tiler: 79987 Hz + drm-curfreq-vertex-tiler: 79987 Hz + drm-total-memory: 290 MiB + drm-shared-memory: 0 MiB + drm-active-memory: 226 MiB + drm-resident-memory:36496 KiB + drm-purgeable-memory: 128 KiB + +Possible `drm-engine-` key names are: `fragment`, and `vertex-tiler`. +`drm-curfreq-` values convey the current operating frequency for that engine. diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile index 7da2b3f02ed9..2c01c1e7523e 100644 --- a/drivers/gpu/drm/panfrost/Makefile +++ b/drivers/gpu/drm/panfrost/Makefile @@ -12,4 +12,6 @@ panfrost-y := \ panfrost_perfcnt.o \ panfrost_dump.o
[PATCH v7 4/5] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
Some BO's might be mapped onto physical memory chunkwise and on demand, like Panfrost's tiler heap. In this case, even though the drm_gem_shmem_object page array might already be allocated, only a very small fraction of the BO is currently backed by system memory, but drm_show_memory_stats will then proceed to add its entire virtual size to the file's total resident size regardless. This led to very unrealistic RSS sizes being reckoned for Panfrost, where said tiler heap buffer is initially allocated with a virtual size of 128 MiB, but only a small part of it will eventually be backed by system memory after successive GPU page faults. Provide a new DRM object generic function that would allow drivers to return a more accurate RSS and purgeable sizes for their BOs. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/drm_file.c | 8 +--- include/drm/drm_gem.h | 9 + 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 883d83bc0e3d..9a1bd8d0d785 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -930,6 +930,8 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) spin_lock(>table_lock); idr_for_each_entry (>object_idr, obj, id) { enum drm_gem_object_status s = 0; + size_t add_size = (obj->funcs && obj->funcs->rss) ? + obj->funcs->rss(obj) : obj->size; if (obj->funcs && obj->funcs->status) { s = obj->funcs->status(obj); @@ -944,7 +946,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } if (s & DRM_GEM_OBJECT_RESIDENT) { - status.resident += obj->size; + status.resident += add_size; } else { /* If already purged or not yet backed by pages, don't * count it as purgeable: @@ -953,14 +955,14 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) { - status.active += obj->size; + status.active += add_size; /* If still active, don't count as purgeable: */ s &= ~DRM_GEM_OBJECT_PURGEABLE; } if (s & DRM_GEM_OBJECT_PURGEABLE) - status.purgeable += obj->size; + status.purgeable += add_size; } spin_unlock(>table_lock); diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index bc9f6aa2f3fe..16364487fde9 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -208,6 +208,15 @@ struct drm_gem_object_funcs { */ enum drm_gem_object_status (*status)(struct drm_gem_object *obj); + /** +* @rss: +* +* Return resident size of the object in physical memory. +* +* Called by drm_show_memory_stats(). +*/ + size_t (*rss)(struct drm_gem_object *obj); + /** * @vm_ops: * -- 2.42.0
[PATCH v7 5/5] drm/panfrost: Implement generic DRM object RSS reporting function
BO's RSS is updated every time new pages are allocated on demand and mapped for the object at GPU page fault's IRQ handler, but only for heap buffers. The reason this is unnecessary for non-heap buffers is that they are mapped onto the GPU's VA space and backed by physical memory in their entirety at BO creation time. This calculation is unnecessary for imported PRIME objects, since heap buffers cannot be exported by our driver, and the actual BO RSS size is the one reported in its attached dmabuf structure. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++ drivers/gpu/drm/panfrost/panfrost_gem.h | 5 + drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + 3 files changed, 21 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 7d8f83d20539..4365434b48db 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -208,6 +208,20 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj return res; } +static size_t panfrost_gem_rss(struct drm_gem_object *obj) +{ + struct panfrost_gem_object *bo = to_panfrost_bo(obj); + + if (bo->is_heap) { + return bo->heap_rss_size; + } else if (bo->base.pages) { + WARN_ON(bo->heap_rss_size); + return bo->base.base.size; + } else { + return 0; + } +} + static const struct drm_gem_object_funcs panfrost_gem_funcs = { .free = panfrost_gem_free_object, .open = panfrost_gem_open, @@ -220,6 +234,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = { .vunmap = drm_gem_shmem_object_vunmap, .mmap = drm_gem_shmem_object_mmap, .status = panfrost_gem_status, + .rss = panfrost_gem_rss, .vm_ops = _gem_shmem_vm_ops, }; diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h index ad2877eeeccd..13c0a8149c3a 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.h +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h @@ -36,6 +36,11 @@ struct panfrost_gem_object { */ atomic_t gpu_usecount; + /* +* Object chunk size currently mapped onto physical memory +*/ + size_t heap_rss_size; + bool noexec :1; bool is_heap:1; }; diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index d54d4e7b2195..846dd697c410 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -522,6 +522,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt); bomapping->active = true; + bo->heap_rss_size += SZ_2M; dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr); -- 2.42.0
[PATCH v7 0/5] Add fdinfo support to Panfrost
This patch series adds fdinfo support to the Panfrost DRM driver. It will display a series of key:value pairs under /proc/pid/fdinfo/fd for render processes that open the Panfrost DRM file. The pairs contain basic drm gpu engine and memory region information that can either be cat by a privileged user or accessed with IGT's gputop utility. Changelog: v1: https://lore.kernel.org/lkml/bb52b872-e41b-3894-285e-b52cfc849...@arm.com/T/ v2: https://lore.kernel.org/lkml/20230901084457.5bc1a...@collabora.com/T/ - Changed the way gpu cycles and engine time are calculated, using GPU registers and taking into account potential resets. - Split render engine values into fragment and vertex/tiler ones. - Added more fine-grained calculation of RSS size for BO's. - Implemente selection of drm-memory region size units. - Removed locking of shrinker's mutex in GEM obj status function. v3: https://lore.kernel.org/lkml/20230905184533.959171-1-adrian.laru...@collabora.com/ - Changed fdinfo engine names to something more descriptive.; - Mentioned GPU cycle counts aren't an exact measure. - Handled the case when job->priv might be NULL. - Handled 32 bit overflow of cycle register. - Kept fdinfo drm memory stats size unit display within 10k times the previous multiplier for more accurate BO size numbers. - Removed special handling of Prime imported BO RSS. - Use rss_size only for heap objects. - Use bo->base.madv instead of specific purgeable flag. - Fixed kernel test robot warnings. v4: https://lore.kernel.org/lkml/20230912084044.955864-1-adrian.laru...@collabora.com/ - Move cycle counter get and put to panfrost_job_hw_submit and panfrost_job_handle_{err,done} for more accuracy. - Make sure cycle counter refs are released in reset path - Drop the model param for toggling cycle counting and do leave it down to the debugfs file. - Don't disable cycle counter when togglint debugfs file, let refcounting logic handle it instead. - Remove fdinfo data nested structure definion and 'names' field - When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume granuality of 2MiB for every successful mapping. - drm-file picks an fdinfo memory object size unit that doesn't lose precision. v5: https://lore.kernel.org/lkml/20230914223928.2374933-1-adrian.laru...@collabora.com/ - Removed explicit initialisation of atomic variable for profiling mode, as it's allocated with kzalloc. - Pass engine utilisation structure to jobs rather than the file context, to avoid future misusage of the latter. - Remove double reading of cycle counter register and ktime in job deqeueue function, as the scheduler will make sure these values are read over in case of requeuing. - Moved putting of cycle counting refcnt into panfrost job dequeue. function to avoid repetition. v6: https://lore.kernel.org/lkml/c73ad42b-a8db-23c2-86c7-1a2939dba...@linux.intel.com/T/ - Fix wrong swapped-round engine time and cycle values in fdinfo drm print statements. v7: - Make sure an object's actual RSS size is added to the overall fdinfo's purgeable and active size tally when it's both resident and purgeable or active. - Create a drm/panfrost.rst documentation file with meaning of fdinfo strings. - BUILD_BUG_ON checking the engine name array size for fdinfo. - Added copyright notices for Amazon in Panfrost's new debugfs files. - Discarded fdinfo memory stats unit size selection patch. Adrián Larumbe (5): drm/panfrost: Add cycle count GPU register definitions drm/panfrost: Add fdinfo support GPU load metrics drm/panfrost: Add fdinfo support for memory stats drm/drm_file: Add DRM obj's RSS reporting function for fdinfo drm/panfrost: Implement generic DRM object RSS reporting function Documentation/gpu/drm-usage-stats.rst | 1 + Documentation/gpu/panfrost.rst | 38 + drivers/gpu/drm/drm_file.c | 8 +-- drivers/gpu/drm/panfrost/Makefile | 2 + drivers/gpu/drm/panfrost/panfrost_debugfs.c | 21 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 14 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ drivers/gpu/drm/panfrost/panfrost_device.c | 2 + drivers/gpu/drm/panfrost/panfrost_device.h | 13 + drivers/gpu/drm/panfrost/panfrost_drv.c | 60 - drivers/gpu/drm/panfrost/panfrost_gem.c | 29 ++ drivers/gpu/drm/panfrost/panfrost_gem.h | 5 ++ drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 4 ++ drivers/gpu/drm/panfrost/panfrost_job.c | 24 + drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++ drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + drivers/gpu/drm/panfrost/panfrost_regs.h| 5 ++ include/drm/drm_gem.h | 9 20 files changed, 289 insertions(+), 4 deletions(-) create mode 100644 Documentati
[PATCH v7 1/5] drm/panfrost: Add cycle count GPU register definitions
These GPU registers will be used when programming the cycle counter, which we need for providing accurate fdinfo drm-cycles values to user space. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_regs.h | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h index 919f44ac853d..55ec807550b3 100644 --- a/drivers/gpu/drm/panfrost/panfrost_regs.h +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h @@ -46,6 +46,8 @@ #define GPU_CMD_SOFT_RESET 0x01 #define GPU_CMD_PERFCNT_CLEAR0x03 #define GPU_CMD_PERFCNT_SAMPLE 0x04 +#define GPU_CMD_CYCLE_COUNT_START0x05 +#define GPU_CMD_CYCLE_COUNT_STOP 0x06 #define GPU_CMD_CLEAN_CACHES 0x07 #define GPU_CMD_CLEAN_INV_CACHES 0x08 #define GPU_STATUS 0x34 @@ -73,6 +75,9 @@ #define GPU_PRFCNT_TILER_EN0x74 #define GPU_PRFCNT_MMU_L2_EN 0x7c +#define GPU_CYCLE_COUNT_LO 0x90 +#define GPU_CYCLE_COUNT_HI 0x94 + #define GPU_THREAD_MAX_THREADS 0x0A0 /* (RO) Maximum number of threads per core */ #define GPU_THREAD_MAX_WORKGROUP_SIZE 0x0A4 /* (RO) Maximum workgroup size */ #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8 /* (RO) Maximum threads waiting at a barrier */ -- 2.42.0
[PATCH v7 3/5] drm/panfrost: Add fdinfo support for memory stats
A new DRM GEM object function is added so that drm_show_memory_stats can provide more accurate memory usage numbers. Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked after locking the driver's shrinker mutex, but drm_show_memory_stats takes over the drm file's object handle database spinlock, so there's potential for a race condition here. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_drv.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_gem.c | 14 ++ 2 files changed, 16 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 97e5bc4a82c8..b834777b409b 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -568,6 +568,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file) struct panfrost_device *pfdev = dev->dev_private; panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p); + + drm_show_memory_stats(p, file); } static const struct file_operations panfrost_drm_driver_fops = { diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 3c812fbd126f..7d8f83d20539 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -195,6 +195,19 @@ static int panfrost_gem_pin(struct drm_gem_object *obj) return drm_gem_shmem_pin(>base); } +static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj) +{ + struct panfrost_gem_object *bo = to_panfrost_bo(obj); + enum drm_gem_object_status res = 0; + + res |= (bo->base.madv == PANFROST_MADV_DONTNEED) ? + DRM_GEM_OBJECT_PURGEABLE : 0; + + res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0; + + return res; +} + static const struct drm_gem_object_funcs panfrost_gem_funcs = { .free = panfrost_gem_free_object, .open = panfrost_gem_open, @@ -206,6 +219,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = { .vmap = drm_gem_shmem_object_vmap, .vunmap = drm_gem_shmem_object_vunmap, .mmap = drm_gem_shmem_object_mmap, + .status = panfrost_gem_status, .vm_ops = _gem_shmem_vm_ops, }; -- 2.42.0
Re: [PATCH v6 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats
On 21.09.2023 11:14, Tvrtko Ursulin wrote: > >On 20/09/2023 16:32, Tvrtko Ursulin wrote: >> >> On 20/09/2023 00:34, Adrián Larumbe wrote: >> > The current implementation will try to pick the highest available size >> > display unit as soon as the BO size exceeds that of the previous >> > multiplier. That can lead to loss of precision in contexts of low memory >> > usage. >> > >> > The new selection criteria try to preserve precision, whilst also >> > increasing the display unit selection threshold to render more accurate >> > values. >> > >> > Signed-off-by: Adrián Larumbe >> > Reviewed-by: Boris Brezillon >> > Reviewed-by: Steven Price >> > --- >> > drivers/gpu/drm/drm_file.c | 5 - >> > 1 file changed, 4 insertions(+), 1 deletion(-) >> > >> > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c >> > index 762965e3d503..34cfa128ffe5 100644 >> > --- a/drivers/gpu/drm/drm_file.c >> > +++ b/drivers/gpu/drm/drm_file.c >> > @@ -872,6 +872,8 @@ void drm_send_event(struct drm_device *dev, struct >> > drm_pending_event *e) >> > } >> > EXPORT_SYMBOL(drm_send_event); >> > +#define UPPER_UNIT_THRESHOLD 100 >> > + >> > static void print_size(struct drm_printer *p, const char *stat, >> > const char *region, u64 sz) >> > { >> > @@ -879,7 +881,8 @@ static void print_size(struct drm_printer *p, >> > const char *stat, >> > unsigned u; >> > for (u = 0; u < ARRAY_SIZE(units) - 1; u++) { >> > - if (sz < SZ_1K) >> > + if ((sz & (SZ_1K - 1)) && >> >> IS_ALIGNED worth it at all? >> >> > + sz < UPPER_UNIT_THRESHOLD * SZ_1K) >> > break; >> >> Excuse me for a late comment (I was away). I did not get what what is >> special about a ~10% threshold? Sounds to me just going with the lower >> unit, when size is not aligned to the higher one, would be better than >> sometimes precision-sometimes-not. > >FWIW both current and the threshold option make testing the feature very >annoying. How so? >So I'd really propose we simply use smaller unit when unaligned. Like I said in the previous reply, for drm files whose overall BO size sum is enormous but not a multiple of a MiB, this would render huge number representations in KiB. I don't find this particularly comfortable to read, and then this extra precision would mean nothing to nvtop or gputop, which would have to scale the size to their available screen dimensions when plotting them. >Regards, > >Tvrtko
Re: [PATCH v6 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats
On 20.09.2023 16:32, Tvrtko Ursulin wrote: > >On 20/09/2023 00:34, Adrián Larumbe wrote: >> The current implementation will try to pick the highest available size >> display unit as soon as the BO size exceeds that of the previous >> multiplier. That can lead to loss of precision in contexts of low memory >> usage. >> >> The new selection criteria try to preserve precision, whilst also >> increasing the display unit selection threshold to render more accurate >> values. >> >> Signed-off-by: Adrián Larumbe >> Reviewed-by: Boris Brezillon >> Reviewed-by: Steven Price >> --- >> drivers/gpu/drm/drm_file.c | 5 - >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c >> index 762965e3d503..34cfa128ffe5 100644 >> --- a/drivers/gpu/drm/drm_file.c >> +++ b/drivers/gpu/drm/drm_file.c >> @@ -872,6 +872,8 @@ void drm_send_event(struct drm_device *dev, struct >> drm_pending_event *e) >> } >> EXPORT_SYMBOL(drm_send_event); >> +#define UPPER_UNIT_THRESHOLD 100 >> + >> static void print_size(struct drm_printer *p, const char *stat, >> const char *region, u64 sz) >> { >> @@ -879,7 +881,8 @@ static void print_size(struct drm_printer *p, const char >> *stat, >> unsigned u; >> for (u = 0; u < ARRAY_SIZE(units) - 1; u++) { >> -if (sz < SZ_1K) >> +if ((sz & (SZ_1K - 1)) && > >IS_ALIGNED worth it at all? This could look better, yeah. >> +sz < UPPER_UNIT_THRESHOLD * SZ_1K) >> break; > >Excuse me for a late comment (I was away). I did not get what what is special >about a ~10% threshold? Sounds to me just going with the lower unit, when size >is not aligned to the higher one, would be better than sometimes >precision-sometimes-not. We had a bit of a debate over this in previous revisions of the patch. It all began when a Panfrost user complained that for relatively small BOs, they were losing precision in the fdinfo file because the sum of the sizes of all BOs for a drm file was in the order of MiBs, but not big enough to warrant losing accuracy when plotting them on nvtop or gputop. At first I thought of letting drivers pick their own preferred unit, but this would lead to inconsistency in the units presented in the fdinfo file across different DRM devices. Rob then suggested imposing a unit multiple threshold, while Boris made the suggestion of checking for unit size alignment to lessen precision loss. In the end Rob thought that minding both constraints was a good solution of compromise. The unit threshold was picked sort of arbitrarily, and suggested by Rob himself. The point of having it is avoiding huge number representations for BO size tallies that aren't aligned to the next unit, and also because BO size sums are scaled when plotting them on a Y axis, so complete accuracy isn't a requirement. >Regards, > >Tvrtko > >> sz = div_u64(sz, SZ_1K); >> } Adrian Larumbe
Re: [PATCH v6 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
On 20.09.2023 16:53, Tvrtko Ursulin wrote: > >On 20/09/2023 00:34, Adrián Larumbe wrote: >> Some BO's might be mapped onto physical memory chunkwise and on demand, >> like Panfrost's tiler heap. In this case, even though the >> drm_gem_shmem_object page array might already be allocated, only a very >> small fraction of the BO is currently backed by system memory, but >> drm_show_memory_stats will then proceed to add its entire virtual size to >> the file's total resident size regardless. >> >> This led to very unrealistic RSS sizes being reckoned for Panfrost, where >> said tiler heap buffer is initially allocated with a virtual size of 128 >> MiB, but only a small part of it will eventually be backed by system memory >> after successive GPU page faults. >> >> Provide a new DRM object generic function that would allow drivers to >> return a more accurate RSS size for their BOs. >> >> Signed-off-by: Adrián Larumbe >> Reviewed-by: Boris Brezillon >> Reviewed-by: Steven Price >> --- >> drivers/gpu/drm/drm_file.c | 5 - >> include/drm/drm_gem.h | 9 + >> 2 files changed, 13 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c >> index 883d83bc0e3d..762965e3d503 100644 >> --- a/drivers/gpu/drm/drm_file.c >> +++ b/drivers/gpu/drm/drm_file.c >> @@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, >> struct drm_file *file) >> } >> if (s & DRM_GEM_OBJECT_RESIDENT) { >> -status.resident += obj->size; >> +if (obj->funcs && obj->funcs->rss) >> +status.resident += obj->funcs->rss(obj); >> +else >> +status.resident += obj->size; > >Presumably you'd want the same smaller size in both active and purgeable? Or >you can end up with more in those two than in rss which would look odd. I didn't think of this. I guess when an object is both resident and purgeable, then its RSS and purgeable sizes should be the same. >Also, alternative to adding a new callback could be adding multiple output >parameters to the existing obj->func->status() which maybe ends up simpler due >fewer callbacks? > >Like: > > s = obj->funcs->status(obj, _status, ) > >And adjust the code flow to pick up the rss if driver signaled it supports >reporting it. I personally find having a separate object callback more readable in this case. There's also the question of what output parameter value would be used as a token that the relevant BO doesn't have an RSS different from its virtual size. I guess '0' would be alright, but this is on the assumption that this could never be a legitimate BO virtual size across all DRM drivers. I guess most of them round the size up to the nearest page multiple at BO creation time. > >Regards, > >Tvrtko > >> } else { >> /* If already purged or not yet backed by pages, don't >> * count it as purgeable: >> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h >> index bc9f6aa2f3fe..16364487fde9 100644 >> --- a/include/drm/drm_gem.h >> +++ b/include/drm/drm_gem.h >> @@ -208,6 +208,15 @@ struct drm_gem_object_funcs { >> */ >> enum drm_gem_object_status (*status)(struct drm_gem_object *obj); >> +/** >> + * @rss: >> + * >> + * Return resident size of the object in physical memory. >> + * >> + * Called by drm_show_memory_stats(). >> + */ >> +size_t (*rss)(struct drm_gem_object *obj); >> + >> /** >> * @vm_ops: >> *
Re: [PATCH v6 2/6] drm/panfrost: Add fdinfo support GPU load metrics
On 20.09.2023 16:40, Tvrtko Ursulin wrote: >On 20/09/2023 00:34, Adrián Larumbe wrote: >> The drm-stats fdinfo tags made available to user space are drm-engine, >> drm-cycles, drm-max-freq and drm-curfreq, one per job slot. >> >> This deviates from standard practice in other DRM drivers, where a single >> set of key:value pairs is provided for the whole render engine. However, >> Panfrost has separate queues for fragment and vertex/tiler jobs, so a >> decision was made to calculate bus cycles and workload times separately. >> >> Maximum operating frequency is calculated at devfreq initialisation time. >> Current frequency is made available to user space because nvtop uses it >> when performing engine usage calculations. >> >> It is important to bear in mind that both GPU cycle and kernel time numbers >> provided are at best rough estimations, and always reported in excess from >> the actual figure because of two reasons: >> - Excess time because of the delay between the end of a job processing, >> the subsequent job IRQ and the actual time of the sample. >> - Time spent in the engine queue waiting for the GPU to pick up the next >> job. >> >> To avoid race conditions during enablement/disabling, a reference counting >> mechanism was introduced, and a job flag that tells us whether a given job >> increased the refcount. This is necessary, because user space can toggle >> cycle counting through a debugfs file, and a given job might have been in >> flight by the time cycle counting was disabled. >> >> The main goal of the debugfs cycle counter knob is letting tools like nvtop >> or IGT's gputop switch it at any time, to avoid power waste in case no >> engine usage measuring is necessary. >> >> Signed-off-by: Adrián Larumbe >> Reviewed-by: Boris Brezillon >> Reviewed-by: Steven Price >> --- >> drivers/gpu/drm/panfrost/Makefile | 2 + >> drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 >> drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 + >> drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++ >> drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ >> drivers/gpu/drm/panfrost/panfrost_device.c | 2 + >> drivers/gpu/drm/panfrost/panfrost_device.h | 13 + >> drivers/gpu/drm/panfrost/panfrost_drv.c | 57 - >> drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++ >> drivers/gpu/drm/panfrost/panfrost_gpu.h | 4 ++ >> drivers/gpu/drm/panfrost/panfrost_job.c | 24 + >> drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++ >> 12 files changed, 191 insertions(+), 1 deletion(-) >> create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c >> create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h >> >> diff --git a/drivers/gpu/drm/panfrost/Makefile >> b/drivers/gpu/drm/panfrost/Makefile >> index 7da2b3f02ed9..2c01c1e7523e 100644 >> --- a/drivers/gpu/drm/panfrost/Makefile >> +++ b/drivers/gpu/drm/panfrost/Makefile >> @@ -12,4 +12,6 @@ panfrost-y := \ >> panfrost_perfcnt.o \ >> panfrost_dump.o >> +panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o >> + >> obj-$(CONFIG_DRM_PANFROST) += panfrost.o >> diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c >> b/drivers/gpu/drm/panfrost/panfrost_debugfs.c >> new file mode 100644 >> index ..cc14eccba206 >> --- /dev/null >> +++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.c >> @@ -0,0 +1,20 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +/* Copyright 2023 Collabora ltd. */ >> + >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +#include "panfrost_device.h" >> +#include "panfrost_gpu.h" >> +#include "panfrost_debugfs.h" >> + >> +void panfrost_debugfs_init(struct drm_minor *minor) >> +{ >> +struct drm_device *dev = minor->dev; >> +struct panfrost_device *pfdev = >> platform_get_drvdata(to_platform_device(dev->dev)); >> + >> +debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, >> >profile_mode); >> +} >> diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h >> b/drivers/gpu/drm/panfrost/panfrost_debugfs.h >> new file mode 100644 >> index ..db1c158bcf2f >> --- /dev/null >> +++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.h >> @@ -0,0 +1,13 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >&g
[PATCH v6 0/6] Add fdinfo support to Panfrost
This patch series adds fdinfo support to the Panfrost DRM driver. It will display a series of key:value pairs under /proc/pid/fdinfo/fd for render processes that open the Panfrost DRM file. The pairs contain basic drm gpu engine and memory region information that can either be cat by a privileged user or accessed with IGT's gputop utility. Changelog: v1: https://lore.kernel.org/lkml/bb52b872-e41b-3894-285e-b52cfc849...@arm.com/T/ v2: https://lore.kernel.org/lkml/20230901084457.5bc1a...@collabora.com/T/ - Changed the way gpu cycles and engine time are calculated, using GPU registers and taking into account potential resets. - Split render engine values into fragment and vertex/tiler ones. - Added more fine-grained calculation of RSS size for BO's. - Implemente selection of drm-memory region size units - Removed locking of shrinker's mutex in GEM obj status function v3: https://lore.kernel.org/lkml/20230905184533.959171-1-adrian.laru...@collabora.com/ - Changed fdinfo engine names to something more descriptive - Mentioned GPU cycle counts aren't an exact measure - Handled the case when job->priv might be NULL - Handled 32 bit overflow of cycle register - Kept fdinfo drm memory stats size unit display within 10k times the previous multiplier for more accurate BO size numbers - Removed special handling of Prime imported BO RSS - Use rss_size only for heap objects - Use bo->base.madv instead of specific purgeable flag - Fixed kernel test robot warnings v4: https://lore.kernel.org/lkml/20230912084044.955864-1-adrian.laru...@collabora.com/ - Move cycle counter get and put to panfrost_job_hw_submit and panfrost_job_handle_{err,done} for more accuracy - Make sure cycle counter refs are released in reset path - Drop the model param for toggling cycle counting and do leave it down to the debugfs file - Don't disable cycle counter when togglint debugfs file, let refcounting logic handle it instead. - Remove fdinfo data nested structure definion and 'names' field - When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume granuality of 2MiB for every successful mapping. - drm-file picks an fdinfo memory object size unit that doesn't lose precision. v5: https://lore.kernel.org/lkml/20230914223928.2374933-1-adrian.laru...@collabora.com/ - Removed explicit initialisation of atomic variable for profiling mode, as it's allocated with kzalloc. - Pass engine utilisation structure to jobs rather than the file context, to avoid future misusage of the latter. - Remove double reading of cycle counter register and ktime in job deqeueue function, as the scheduler will make sure these values are read over in case of requeuing. - Moved putting of cycle counting refcnt into panfrost job dequeue function to avoid repetition. v6: - Fix wrong swapped-round engine time and cycle values in fdinfo drm print statements. Adrián Larumbe (6): drm/panfrost: Add cycle count GPU register definitions drm/panfrost: Add fdinfo support GPU load metrics drm/panfrost: Add fdinfo support for memory stats drm/drm_file: Add DRM obj's RSS reporting function for fdinfo drm/panfrost: Implement generic DRM object RSS reporting function drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats drivers/gpu/drm/drm_file.c | 10 +++- drivers/gpu/drm/panfrost/Makefile | 2 + drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 +++ drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ drivers/gpu/drm/panfrost/panfrost_device.c | 2 + drivers/gpu/drm/panfrost/panfrost_device.h | 13 + drivers/gpu/drm/panfrost/panfrost_drv.c | 59 - drivers/gpu/drm/panfrost/panfrost_gem.c | 29 ++ drivers/gpu/drm/panfrost/panfrost_gem.h | 5 ++ drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 4 ++ drivers/gpu/drm/panfrost/panfrost_job.c | 24 + drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++ drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + drivers/gpu/drm/panfrost/panfrost_regs.h| 5 ++ include/drm/drm_gem.h | 9 18 files changed, 250 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h base-commit: f45acf7acf75921c0409d452f0165f51a19a74fd -- 2.42.0
[PATCH v6 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
Some BO's might be mapped onto physical memory chunkwise and on demand, like Panfrost's tiler heap. In this case, even though the drm_gem_shmem_object page array might already be allocated, only a very small fraction of the BO is currently backed by system memory, but drm_show_memory_stats will then proceed to add its entire virtual size to the file's total resident size regardless. This led to very unrealistic RSS sizes being reckoned for Panfrost, where said tiler heap buffer is initially allocated with a virtual size of 128 MiB, but only a small part of it will eventually be backed by system memory after successive GPU page faults. Provide a new DRM object generic function that would allow drivers to return a more accurate RSS size for their BOs. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/drm_file.c | 5 - include/drm/drm_gem.h | 9 + 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 883d83bc0e3d..762965e3d503 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } if (s & DRM_GEM_OBJECT_RESIDENT) { - status.resident += obj->size; + if (obj->funcs && obj->funcs->rss) + status.resident += obj->funcs->rss(obj); + else + status.resident += obj->size; } else { /* If already purged or not yet backed by pages, don't * count it as purgeable: diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index bc9f6aa2f3fe..16364487fde9 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -208,6 +208,15 @@ struct drm_gem_object_funcs { */ enum drm_gem_object_status (*status)(struct drm_gem_object *obj); + /** +* @rss: +* +* Return resident size of the object in physical memory. +* +* Called by drm_show_memory_stats(). +*/ + size_t (*rss)(struct drm_gem_object *obj); + /** * @vm_ops: * -- 2.42.0
[PATCH v6 2/6] drm/panfrost: Add fdinfo support GPU load metrics
The drm-stats fdinfo tags made available to user space are drm-engine, drm-cycles, drm-max-freq and drm-curfreq, one per job slot. This deviates from standard practice in other DRM drivers, where a single set of key:value pairs is provided for the whole render engine. However, Panfrost has separate queues for fragment and vertex/tiler jobs, so a decision was made to calculate bus cycles and workload times separately. Maximum operating frequency is calculated at devfreq initialisation time. Current frequency is made available to user space because nvtop uses it when performing engine usage calculations. It is important to bear in mind that both GPU cycle and kernel time numbers provided are at best rough estimations, and always reported in excess from the actual figure because of two reasons: - Excess time because of the delay between the end of a job processing, the subsequent job IRQ and the actual time of the sample. - Time spent in the engine queue waiting for the GPU to pick up the next job. To avoid race conditions during enablement/disabling, a reference counting mechanism was introduced, and a job flag that tells us whether a given job increased the refcount. This is necessary, because user space can toggle cycle counting through a debugfs file, and a given job might have been in flight by the time cycle counting was disabled. The main goal of the debugfs cycle counter knob is letting tools like nvtop or IGT's gputop switch it at any time, to avoid power waste in case no engine usage measuring is necessary. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/Makefile | 2 + drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ drivers/gpu/drm/panfrost/panfrost_device.c | 2 + drivers/gpu/drm/panfrost/panfrost_device.h | 13 + drivers/gpu/drm/panfrost/panfrost_drv.c | 57 - drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 4 ++ drivers/gpu/drm/panfrost/panfrost_job.c | 24 + drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++ 12 files changed, 191 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile index 7da2b3f02ed9..2c01c1e7523e 100644 --- a/drivers/gpu/drm/panfrost/Makefile +++ b/drivers/gpu/drm/panfrost/Makefile @@ -12,4 +12,6 @@ panfrost-y := \ panfrost_perfcnt.o \ panfrost_dump.o +panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o + obj-$(CONFIG_DRM_PANFROST) += panfrost.o diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c b/drivers/gpu/drm/panfrost/panfrost_debugfs.c new file mode 100644 index ..cc14eccba206 --- /dev/null +++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright 2023 Collabora ltd. */ + +#include +#include +#include +#include +#include + +#include "panfrost_device.h" +#include "panfrost_gpu.h" +#include "panfrost_debugfs.h" + +void panfrost_debugfs_init(struct drm_minor *minor) +{ + struct drm_device *dev = minor->dev; + struct panfrost_device *pfdev = platform_get_drvdata(to_platform_device(dev->dev)); + + debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, >profile_mode); +} diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h b/drivers/gpu/drm/panfrost/panfrost_debugfs.h new file mode 100644 index ..db1c158bcf2f --- /dev/null +++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright 2023 Collabora ltd. + */ + +#ifndef PANFROST_DEBUGFS_H +#define PANFROST_DEBUGFS_H + +#ifdef CONFIG_DEBUG_FS +void panfrost_debugfs_init(struct drm_minor *minor); +#endif + +#endif /* PANFROST_DEBUGFS_H */ diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c index 58dfb15a8757..28caffc689e2 100644 --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c @@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev, spin_lock_irqsave(>lock, irqflags); panfrost_devfreq_update_utilization(pfdevfreq); + pfdevfreq->current_frequency = status->current_frequency; status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time, pfdevfreq->idle_time)); @@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev) struct devfreq *devfreq; struct therm
[PATCH v6 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats
The current implementation will try to pick the highest available size display unit as soon as the BO size exceeds that of the previous multiplier. That can lead to loss of precision in contexts of low memory usage. The new selection criteria try to preserve precision, whilst also increasing the display unit selection threshold to render more accurate values. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/drm_file.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 762965e3d503..34cfa128ffe5 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -872,6 +872,8 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e) } EXPORT_SYMBOL(drm_send_event); +#define UPPER_UNIT_THRESHOLD 100 + static void print_size(struct drm_printer *p, const char *stat, const char *region, u64 sz) { @@ -879,7 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat, unsigned u; for (u = 0; u < ARRAY_SIZE(units) - 1; u++) { - if (sz < SZ_1K) + if ((sz & (SZ_1K - 1)) && + sz < UPPER_UNIT_THRESHOLD * SZ_1K) break; sz = div_u64(sz, SZ_1K); } -- 2.42.0
[PATCH v6 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
BO's RSS is updated every time new pages are allocated on demand and mapped for the object at GPU page fault's IRQ handler, but only for heap buffers. The reason this is unnecessary for non-heap buffers is that they are mapped onto the GPU's VA space and backed by physical memory in their entirety at BO creation time. This calculation is unnecessary for imported PRIME objects, since heap buffers cannot be exported by our driver, and the actual BO RSS size is the one reported in its attached dmabuf structure. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++ drivers/gpu/drm/panfrost/panfrost_gem.h | 5 + drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + 3 files changed, 21 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 7d8f83d20539..4365434b48db 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -208,6 +208,20 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj return res; } +static size_t panfrost_gem_rss(struct drm_gem_object *obj) +{ + struct panfrost_gem_object *bo = to_panfrost_bo(obj); + + if (bo->is_heap) { + return bo->heap_rss_size; + } else if (bo->base.pages) { + WARN_ON(bo->heap_rss_size); + return bo->base.base.size; + } else { + return 0; + } +} + static const struct drm_gem_object_funcs panfrost_gem_funcs = { .free = panfrost_gem_free_object, .open = panfrost_gem_open, @@ -220,6 +234,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = { .vunmap = drm_gem_shmem_object_vunmap, .mmap = drm_gem_shmem_object_mmap, .status = panfrost_gem_status, + .rss = panfrost_gem_rss, .vm_ops = _gem_shmem_vm_ops, }; diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h index ad2877eeeccd..13c0a8149c3a 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.h +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h @@ -36,6 +36,11 @@ struct panfrost_gem_object { */ atomic_t gpu_usecount; + /* +* Object chunk size currently mapped onto physical memory +*/ + size_t heap_rss_size; + bool noexec :1; bool is_heap:1; }; diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index d54d4e7b2195..7b1490cdaa48 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -522,6 +522,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt); bomapping->active = true; + bo->heap_rss_size += SZ_2; dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr); -- 2.42.0
[PATCH v6 1/6] drm/panfrost: Add cycle count GPU register definitions
These GPU registers will be used when programming the cycle counter, which we need for providing accurate fdinfo drm-cycles values to user space. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_regs.h | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h index 919f44ac853d..55ec807550b3 100644 --- a/drivers/gpu/drm/panfrost/panfrost_regs.h +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h @@ -46,6 +46,8 @@ #define GPU_CMD_SOFT_RESET 0x01 #define GPU_CMD_PERFCNT_CLEAR0x03 #define GPU_CMD_PERFCNT_SAMPLE 0x04 +#define GPU_CMD_CYCLE_COUNT_START0x05 +#define GPU_CMD_CYCLE_COUNT_STOP 0x06 #define GPU_CMD_CLEAN_CACHES 0x07 #define GPU_CMD_CLEAN_INV_CACHES 0x08 #define GPU_STATUS 0x34 @@ -73,6 +75,9 @@ #define GPU_PRFCNT_TILER_EN0x74 #define GPU_PRFCNT_MMU_L2_EN 0x7c +#define GPU_CYCLE_COUNT_LO 0x90 +#define GPU_CYCLE_COUNT_HI 0x94 + #define GPU_THREAD_MAX_THREADS 0x0A0 /* (RO) Maximum number of threads per core */ #define GPU_THREAD_MAX_WORKGROUP_SIZE 0x0A4 /* (RO) Maximum workgroup size */ #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8 /* (RO) Maximum threads waiting at a barrier */ -- 2.42.0
[PATCH v6 3/6] drm/panfrost: Add fdinfo support for memory stats
A new DRM GEM object function is added so that drm_show_memory_stats can provide more accurate memory usage numbers. Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked after locking the driver's shrinker mutex, but drm_show_memory_stats takes over the drm file's object handle database spinlock, so there's potential for a race condition here. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_drv.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_gem.c | 14 ++ 2 files changed, 16 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 3c93a11deab1..8cd9331ac4b8 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -567,6 +567,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file) struct panfrost_device *pfdev = dev->dev_private; panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p); + + drm_show_memory_stats(p, file); } static const struct file_operations panfrost_drm_driver_fops = { diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 3c812fbd126f..7d8f83d20539 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -195,6 +195,19 @@ static int panfrost_gem_pin(struct drm_gem_object *obj) return drm_gem_shmem_pin(>base); } +static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj) +{ + struct panfrost_gem_object *bo = to_panfrost_bo(obj); + enum drm_gem_object_status res = 0; + + res |= (bo->base.madv == PANFROST_MADV_DONTNEED) ? + DRM_GEM_OBJECT_PURGEABLE : 0; + + res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0; + + return res; +} + static const struct drm_gem_object_funcs panfrost_gem_funcs = { .free = panfrost_gem_free_object, .open = panfrost_gem_open, @@ -206,6 +219,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = { .vmap = drm_gem_shmem_object_vmap, .vunmap = drm_gem_shmem_object_vunmap, .mmap = drm_gem_shmem_object_mmap, + .status = panfrost_gem_status, .vm_ops = _gem_shmem_vm_ops, }; -- 2.42.0
[PATCH v5 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
BO's RSS is updated every time new pages are allocated on demand and mapped for the object at GPU page fault's IRQ handler, but only for heap buffers. The reason this is unnecessary for non-heap buffers is that they are mapped onto the GPU's VA space and backed by physical memory in their entirety at BO creation time. This calculation is unnecessary for imported PRIME objects, since heap buffers cannot be exported by our driver, and the actual BO RSS size is the one reported in its attached dmabuf structure. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++ drivers/gpu/drm/panfrost/panfrost_gem.h | 5 + drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + 3 files changed, 21 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 7d8f83d20539..4365434b48db 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -208,6 +208,20 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj return res; } +static size_t panfrost_gem_rss(struct drm_gem_object *obj) +{ + struct panfrost_gem_object *bo = to_panfrost_bo(obj); + + if (bo->is_heap) { + return bo->heap_rss_size; + } else if (bo->base.pages) { + WARN_ON(bo->heap_rss_size); + return bo->base.base.size; + } else { + return 0; + } +} + static const struct drm_gem_object_funcs panfrost_gem_funcs = { .free = panfrost_gem_free_object, .open = panfrost_gem_open, @@ -220,6 +234,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = { .vunmap = drm_gem_shmem_object_vunmap, .mmap = drm_gem_shmem_object_mmap, .status = panfrost_gem_status, + .rss = panfrost_gem_rss, .vm_ops = _gem_shmem_vm_ops, }; diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h index ad2877eeeccd..13c0a8149c3a 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.h +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h @@ -36,6 +36,11 @@ struct panfrost_gem_object { */ atomic_t gpu_usecount; + /* +* Object chunk size currently mapped onto physical memory +*/ + size_t heap_rss_size; + bool noexec :1; bool is_heap:1; }; diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index d54d4e7b2195..7b1490cdaa48 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -522,6 +522,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt); bomapping->active = true; + bo->heap_rss_size += SZ_2; dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr); -- 2.42.0
[PATCH v5 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats
The current implementation will try to pick the highest available size display unit as soon as the BO size exceeds that of the previous multiplier. That can lead to loss of precision in contexts of low memory usage. The new selection criteria try to preserve precision, whilst also increasing the display unit selection threshold to render more accurate values. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/drm_file.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 762965e3d503..34cfa128ffe5 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -872,6 +872,8 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e) } EXPORT_SYMBOL(drm_send_event); +#define UPPER_UNIT_THRESHOLD 100 + static void print_size(struct drm_printer *p, const char *stat, const char *region, u64 sz) { @@ -879,7 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat, unsigned u; for (u = 0; u < ARRAY_SIZE(units) - 1; u++) { - if (sz < SZ_1K) + if ((sz & (SZ_1K - 1)) && + sz < UPPER_UNIT_THRESHOLD * SZ_1K) break; sz = div_u64(sz, SZ_1K); } -- 2.42.0
[PATCH v5 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
Some BO's might be mapped onto physical memory chunkwise and on demand, like Panfrost's tiler heap. In this case, even though the drm_gem_shmem_object page array might already be allocated, only a very small fraction of the BO is currently backed by system memory, but drm_show_memory_stats will then proceed to add its entire virtual size to the file's total resident size regardless. This led to very unrealistic RSS sizes being reckoned for Panfrost, where said tiler heap buffer is initially allocated with a virtual size of 128 MiB, but only a small part of it will eventually be backed by system memory after successive GPU page faults. Provide a new DRM object generic function that would allow drivers to return a more accurate RSS size for their BOs. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon --- drivers/gpu/drm/drm_file.c | 5 - include/drm/drm_gem.h | 9 + 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 883d83bc0e3d..762965e3d503 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } if (s & DRM_GEM_OBJECT_RESIDENT) { - status.resident += obj->size; + if (obj->funcs && obj->funcs->rss) + status.resident += obj->funcs->rss(obj); + else + status.resident += obj->size; } else { /* If already purged or not yet backed by pages, don't * count it as purgeable: diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index bc9f6aa2f3fe..16364487fde9 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -208,6 +208,15 @@ struct drm_gem_object_funcs { */ enum drm_gem_object_status (*status)(struct drm_gem_object *obj); + /** +* @rss: +* +* Return resident size of the object in physical memory. +* +* Called by drm_show_memory_stats(). +*/ + size_t (*rss)(struct drm_gem_object *obj); + /** * @vm_ops: * -- 2.42.0
[PATCH v5 1/6] drm/panfrost: Add cycle count GPU register definitions
These GPU registers will be used when programming the cycle counter, which we need for providing accurate fdinfo drm-cycles values to user space. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_regs.h | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h index 919f44ac853d..55ec807550b3 100644 --- a/drivers/gpu/drm/panfrost/panfrost_regs.h +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h @@ -46,6 +46,8 @@ #define GPU_CMD_SOFT_RESET 0x01 #define GPU_CMD_PERFCNT_CLEAR0x03 #define GPU_CMD_PERFCNT_SAMPLE 0x04 +#define GPU_CMD_CYCLE_COUNT_START0x05 +#define GPU_CMD_CYCLE_COUNT_STOP 0x06 #define GPU_CMD_CLEAN_CACHES 0x07 #define GPU_CMD_CLEAN_INV_CACHES 0x08 #define GPU_STATUS 0x34 @@ -73,6 +75,9 @@ #define GPU_PRFCNT_TILER_EN0x74 #define GPU_PRFCNT_MMU_L2_EN 0x7c +#define GPU_CYCLE_COUNT_LO 0x90 +#define GPU_CYCLE_COUNT_HI 0x94 + #define GPU_THREAD_MAX_THREADS 0x0A0 /* (RO) Maximum number of threads per core */ #define GPU_THREAD_MAX_WORKGROUP_SIZE 0x0A4 /* (RO) Maximum workgroup size */ #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8 /* (RO) Maximum threads waiting at a barrier */ -- 2.42.0
[PATCH v5 3/6] drm/panfrost: Add fdinfo support for memory stats
A new DRM GEM object function is added so that drm_show_memory_stats can provide more accurate memory usage numbers. Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked after locking the driver's shrinker mutex, but drm_show_memory_stats takes over the drm file's object handle database spinlock, so there's potential for a race condition here. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon --- drivers/gpu/drm/panfrost/panfrost_drv.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_gem.c | 14 ++ 2 files changed, 16 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index a8d02273afab..ef6563cf5f7e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -567,6 +567,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file) struct panfrost_device *pfdev = dev->dev_private; panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p); + + drm_show_memory_stats(p, file); } static const struct file_operations panfrost_drm_driver_fops = { diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 3c812fbd126f..7d8f83d20539 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -195,6 +195,19 @@ static int panfrost_gem_pin(struct drm_gem_object *obj) return drm_gem_shmem_pin(>base); } +static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj) +{ + struct panfrost_gem_object *bo = to_panfrost_bo(obj); + enum drm_gem_object_status res = 0; + + res |= (bo->base.madv == PANFROST_MADV_DONTNEED) ? + DRM_GEM_OBJECT_PURGEABLE : 0; + + res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0; + + return res; +} + static const struct drm_gem_object_funcs panfrost_gem_funcs = { .free = panfrost_gem_free_object, .open = panfrost_gem_open, @@ -206,6 +219,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = { .vmap = drm_gem_shmem_object_vmap, .vunmap = drm_gem_shmem_object_vunmap, .mmap = drm_gem_shmem_object_mmap, + .status = panfrost_gem_status, .vm_ops = _gem_shmem_vm_ops, }; -- 2.42.0
[PATCH v5 2/6] drm/panfrost: Add fdinfo support GPU load metrics
The drm-stats fdinfo tags made available to user space are drm-engine, drm-cycles, drm-max-freq and drm-curfreq, one per job slot. This deviates from standard practice in other DRM drivers, where a single set of key:value pairs is provided for the whole render engine. However, Panfrost has separate queues for fragment and vertex/tiler jobs, so a decision was made to calculate bus cycles and workload times separately. Maximum operating frequency is calculated at devfreq initialisation time. Current frequency is made available to user space because nvtop uses it when performing engine usage calculations. It is important to bear in mind that both GPU cycle and kernel time numbers provided are at best rough estimations, and always reported in excess from the actual figure because of two reasons: - Excess time because of the delay between the end of a job processing, the subsequent job IRQ and the actual time of the sample. - Time spent in the engine queue waiting for the GPU to pick up the next job. To avoid race conditions during enablement/disabling, a reference counting mechanism was introduced, and a job flag that tells us whether a given job increased the refcount. This is necessary, because user space can toggle cycle counting through a debugfs file, and a given job might have been in flight by the time cycle counting was disabled. The main goal of the debugfs cycle counter knob is letting tools like nvtop or IGT's gputop switch it at any time, to avoid power waste in case no engine usage measuring is necessary. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon --- drivers/gpu/drm/panfrost/Makefile | 2 + drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ drivers/gpu/drm/panfrost/panfrost_device.c | 2 + drivers/gpu/drm/panfrost/panfrost_device.h | 13 + drivers/gpu/drm/panfrost/panfrost_drv.c | 57 - drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 4 ++ drivers/gpu/drm/panfrost/panfrost_job.c | 24 + drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++ 12 files changed, 191 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile index 7da2b3f02ed9..2c01c1e7523e 100644 --- a/drivers/gpu/drm/panfrost/Makefile +++ b/drivers/gpu/drm/panfrost/Makefile @@ -12,4 +12,6 @@ panfrost-y := \ panfrost_perfcnt.o \ panfrost_dump.o +panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o + obj-$(CONFIG_DRM_PANFROST) += panfrost.o diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c b/drivers/gpu/drm/panfrost/panfrost_debugfs.c new file mode 100644 index ..cc14eccba206 --- /dev/null +++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright 2023 Collabora ltd. */ + +#include +#include +#include +#include +#include + +#include "panfrost_device.h" +#include "panfrost_gpu.h" +#include "panfrost_debugfs.h" + +void panfrost_debugfs_init(struct drm_minor *minor) +{ + struct drm_device *dev = minor->dev; + struct panfrost_device *pfdev = platform_get_drvdata(to_platform_device(dev->dev)); + + debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, >profile_mode); +} diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h b/drivers/gpu/drm/panfrost/panfrost_debugfs.h new file mode 100644 index ..db1c158bcf2f --- /dev/null +++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright 2023 Collabora ltd. + */ + +#ifndef PANFROST_DEBUGFS_H +#define PANFROST_DEBUGFS_H + +#ifdef CONFIG_DEBUG_FS +void panfrost_debugfs_init(struct drm_minor *minor); +#endif + +#endif /* PANFROST_DEBUGFS_H */ diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c index 58dfb15a8757..28caffc689e2 100644 --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c @@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev, spin_lock_irqsave(>lock, irqflags); panfrost_devfreq_update_utilization(pfdevfreq); + pfdevfreq->current_frequency = status->current_frequency; status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time, pfdevfreq->idle_time)); @@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev) struct devfreq *devfreq; struct thermal_cooling_de
[PATCH v5 0/6] Add fdinfo support to Panfrost
This patch series adds fdinfo support to the Panfrost DRM driver. It will display a series of key:value pairs under /proc/pid/fdinfo/fd for render processes that open the Panfrost DRM file. The pairs contain basic drm gpu engine and memory region information that can either be cat by a privileged user or accessed with IGT's gputop utility. Changelog: v1: https://lore.kernel.org/lkml/bb52b872-e41b-3894-285e-b52cfc849...@arm.com/T/ v2: https://lore.kernel.org/lkml/20230901084457.5bc1a...@collabora.com/T/ - Changed the way gpu cycles and engine time are calculated, using GPU registers and taking into account potential resets. - Split render engine values into fragment and vertex/tiler ones. - Added more fine-grained calculation of RSS size for BO's. - Implemente selection of drm-memory region size units - Removed locking of shrinker's mutex in GEM obj status function v3: https://lore.kernel.org/lkml/20230905184533.959171-1-adrian.laru...@collabora.com/ - Changed fdinfo engine names to something more descriptive - Mentioned GPU cycle counts aren't an exact measure - Handled the case when job->priv might be NULL - Handled 32 bit overflow of cycle register - Kept fdinfo drm memory stats size unit display within 10k times the previous multiplier for more accurate BO size numbers - Removed special handling of Prime imported BO RSS - Use rss_size only for heap objects - Use bo->base.madv instead of specific purgeable flag - Fixed kernel test robot warnings v4: https://lore.kernel.org/lkml/20230912084044.955864-1-adrian.laru...@collabora.com/ - Move cycle counter get and put to panfrost_job_hw_submit and panfrost_job_handle_{err,done} for more accuracy - Make sure cycle counter refs are released in reset path - Drop the model param for toggling cycle counting and do leave it down to the debugfs file - Don't disable cycle counter when togglint debugfs file, let refcounting logic handle it instead. - Remove fdinfo data nested structure definion and 'names' field - When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume granuality of 2MiB for every successful mapping. - drm-file picks an fdinfo memory object size unit that doesn't lose precision. v5: - Removed explicit initialisation of atomic variable for profiling mode, as it's allocated with kzalloc. - Pass engine utilisation structure to jobs rather than the file context, to avoid future misusage of the latter. - Remove double reading of cycle counter register and ktime in job deqeueue function, as the scheduler will make sure these values are read over in case of requeuing. - Moved putting of cycle counting refcnt into panfrost job dequeue function to avoid repetition. Adrián Larumbe (6): drm/panfrost: Add cycle count GPU register definitions drm/panfrost: Add fdinfo support GPU load metrics drm/panfrost: Add fdinfo support for memory stats drm/drm_file: Add DRM obj's RSS reporting function for fdinfo drm/panfrost: Implement generic DRM object RSS reporting function drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats drivers/gpu/drm/drm_file.c | 10 +++- drivers/gpu/drm/panfrost/Makefile | 2 + drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 +++ drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ drivers/gpu/drm/panfrost/panfrost_device.c | 2 + drivers/gpu/drm/panfrost/panfrost_device.h | 13 + drivers/gpu/drm/panfrost/panfrost_drv.c | 59 - drivers/gpu/drm/panfrost/panfrost_gem.c | 29 ++ drivers/gpu/drm/panfrost/panfrost_gem.h | 5 ++ drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 4 ++ drivers/gpu/drm/panfrost/panfrost_job.c | 24 + drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++ drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + drivers/gpu/drm/panfrost/panfrost_regs.h| 5 ++ include/drm/drm_gem.h | 9 18 files changed, 250 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h base-commit: f45acf7acf75921c0409d452f0165f51a19a74fd -- 2.42.0
[PATCH v4 3/6] drm/panfrost: Add fdinfo support for memory stats
A new DRM GEM object function is added so that drm_show_memory_stats can provide more accurate memory usage numbers. Ideally, in panfrost_gem_status, the BO's purgeable flag would be checked after locking the driver's shrinker mutex, but drm_show_memory_stats takes over the drm file's object handle database spinlock, so there's potential for a race condition here. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panfrost/panfrost_drv.c | 2 ++ drivers/gpu/drm/panfrost/panfrost_gem.c | 14 ++ 2 files changed, 16 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index 2d9c115821a7..e71a89a283cd 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -567,6 +567,8 @@ static void panfrost_show_fdinfo(struct drm_printer *p, struct drm_file *file) struct panfrost_device *pfdev = dev->dev_private; panfrost_gpu_show_fdinfo(pfdev, file->driver_priv, p); + + drm_show_memory_stats(p, file); } static const struct file_operations panfrost_drm_driver_fops = { diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 3c812fbd126f..7d8f83d20539 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -195,6 +195,19 @@ static int panfrost_gem_pin(struct drm_gem_object *obj) return drm_gem_shmem_pin(>base); } +static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj) +{ + struct panfrost_gem_object *bo = to_panfrost_bo(obj); + enum drm_gem_object_status res = 0; + + res |= (bo->base.madv == PANFROST_MADV_DONTNEED) ? + DRM_GEM_OBJECT_PURGEABLE : 0; + + res |= (bo->base.pages) ? DRM_GEM_OBJECT_RESIDENT : 0; + + return res; +} + static const struct drm_gem_object_funcs panfrost_gem_funcs = { .free = panfrost_gem_free_object, .open = panfrost_gem_open, @@ -206,6 +219,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = { .vmap = drm_gem_shmem_object_vmap, .vunmap = drm_gem_shmem_object_vunmap, .mmap = drm_gem_shmem_object_mmap, + .status = panfrost_gem_status, .vm_ops = _gem_shmem_vm_ops, }; -- 2.42.0
[PATCH v4 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats
The current implementation will try to pick the highest available size display unit as soon as the BO size exceeds that of the previous multiplier. That can lead to loss of precision in BO's whose size is not a multiple of a MiB. Fix it by changing the unit selection criteria. For much bigger BO's, their size will naturally be aligned on something bigger than a 4 KiB page, so in practice it is very unlikely their display unit would default to KiB. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/drm_file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 762965e3d503..bf7d2fe46bfa 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -879,7 +879,7 @@ static void print_size(struct drm_printer *p, const char *stat, unsigned u; for (u = 0; u < ARRAY_SIZE(units) - 1; u++) { - if (sz < SZ_1K) + if (sz & (SZ_1K - 1)) break; sz = div_u64(sz, SZ_1K); } -- 2.42.0
[PATCH v4 4/6] drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
Some BO's might be mapped onto physical memory chunkwise and on demand, like Panfrost's tiler heap. In this case, even though the drm_gem_shmem_object page array might already be allocated, only a very small fraction of the BO is currently backed by system memory, but drm_show_memory_stats will then proceed to add its entire virtual size to the file's total resident size regardless. This led to very unrealistic RSS sizes being reckoned for Panfrost, where said tiler heap buffer is initially allocated with a virtual size of 128 MiB, but only a small part of it will eventually be backed by system memory after successive GPU page faults. Provide a new DRM object generic function that would allow drivers to return a more accurate RSS size for their BOs. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon --- drivers/gpu/drm/drm_file.c | 5 - include/drm/drm_gem.h | 9 + 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 883d83bc0e3d..762965e3d503 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -944,7 +944,10 @@ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) } if (s & DRM_GEM_OBJECT_RESIDENT) { - status.resident += obj->size; + if (obj->funcs && obj->funcs->rss) + status.resident += obj->funcs->rss(obj); + else + status.resident += obj->size; } else { /* If already purged or not yet backed by pages, don't * count it as purgeable: diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h index bc9f6aa2f3fe..16364487fde9 100644 --- a/include/drm/drm_gem.h +++ b/include/drm/drm_gem.h @@ -208,6 +208,15 @@ struct drm_gem_object_funcs { */ enum drm_gem_object_status (*status)(struct drm_gem_object *obj); + /** +* @rss: +* +* Return resident size of the object in physical memory. +* +* Called by drm_show_memory_stats(). +*/ + size_t (*rss)(struct drm_gem_object *obj); + /** * @vm_ops: * -- 2.42.0
[PATCH v4 2/6] drm/panfrost: Add fdinfo support GPU load metrics
The drm-stats fdinfo tags made available to user space are drm-engine, drm-cycles, drm-max-freq and drm-curfreq, one per job slot. This deviates from standard practice in other DRM drivers, where a single set of key:value pairs is provided for the whole render engine. However, Panfrost has separate queues for fragment and vertex/tiler jobs, so a decision was made to calculate bus cycles and workload times separately. Maximum operating frequency is calculated at devfreq initialisation time. Current frequency is made available to user space because nvtop uses it when performing engine usage calculations. It is important to bear in mind that both GPU cycle and kernel time numbers provided are at best rough estimations, and always reported in excess from the actual figure because of two reasons: - Excess time because of the delay between the end of a job processing, the subsequent job IRQ and the actual time of the sample. - Time spent in the engine queue waiting for the GPU to pick up the next job. To avoid race conditions during enablement/disabling, a reference counting mechanism was introduced, and a job flag that tells us whether a given job increased the refcount. This is necessary, because user space can toggle cycle counting through a debugfs file, and a given job might have been in flight by the time cycle counting was disabled. The main goal of the debugfs cycle counter knob is letting tools like nvtop or IGT's gputop switch it at any time, to avoid power waste in case no engine usage measuring is necessary. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panfrost/Makefile | 2 + drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ drivers/gpu/drm/panfrost/panfrost_device.c | 2 + drivers/gpu/drm/panfrost/panfrost_device.h | 13 + drivers/gpu/drm/panfrost/panfrost_drv.c | 57 - drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 +++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 6 +++ drivers/gpu/drm/panfrost/panfrost_job.c | 39 ++ drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++ 12 files changed, 208 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile index 7da2b3f02ed9..2c01c1e7523e 100644 --- a/drivers/gpu/drm/panfrost/Makefile +++ b/drivers/gpu/drm/panfrost/Makefile @@ -12,4 +12,6 @@ panfrost-y := \ panfrost_perfcnt.o \ panfrost_dump.o +panfrost-$(CONFIG_DEBUG_FS) += panfrost_debugfs.o + obj-$(CONFIG_DRM_PANFROST) += panfrost.o diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.c b/drivers/gpu/drm/panfrost/panfrost_debugfs.c new file mode 100644 index ..cc14eccba206 --- /dev/null +++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright 2023 Collabora ltd. */ + +#include +#include +#include +#include +#include + +#include "panfrost_device.h" +#include "panfrost_gpu.h" +#include "panfrost_debugfs.h" + +void panfrost_debugfs_init(struct drm_minor *minor) +{ + struct drm_device *dev = minor->dev; + struct panfrost_device *pfdev = platform_get_drvdata(to_platform_device(dev->dev)); + + debugfs_create_atomic_t("profile", 0600, minor->debugfs_root, >profile_mode); +} diff --git a/drivers/gpu/drm/panfrost/panfrost_debugfs.h b/drivers/gpu/drm/panfrost/panfrost_debugfs.h new file mode 100644 index ..db1c158bcf2f --- /dev/null +++ b/drivers/gpu/drm/panfrost/panfrost_debugfs.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright 2023 Collabora ltd. + */ + +#ifndef PANFROST_DEBUGFS_H +#define PANFROST_DEBUGFS_H + +#ifdef CONFIG_DEBUG_FS +void panfrost_debugfs_init(struct drm_minor *minor); +#endif + +#endif /* PANFROST_DEBUGFS_H */ diff --git a/drivers/gpu/drm/panfrost/panfrost_devfreq.c b/drivers/gpu/drm/panfrost/panfrost_devfreq.c index 58dfb15a8757..28caffc689e2 100644 --- a/drivers/gpu/drm/panfrost/panfrost_devfreq.c +++ b/drivers/gpu/drm/panfrost/panfrost_devfreq.c @@ -58,6 +58,7 @@ static int panfrost_devfreq_get_dev_status(struct device *dev, spin_lock_irqsave(>lock, irqflags); panfrost_devfreq_update_utilization(pfdevfreq); + pfdevfreq->current_frequency = status->current_frequency; status->total_time = ktime_to_ns(ktime_add(pfdevfreq->busy_time, pfdevfreq->idle_time)); @@ -117,6 +118,7 @@ int panfrost_devfreq_init(struct panfrost_device *pfdev) struct devfreq *devfreq; struct thermal_cooling_device *cooling; struct panfrost_d
[PATCH v4 0/6] Add fdinfo support to Panfrost
This patch series adds fdinfo support to the Panfrost DRM driver. It will display a series of key:value pairs under /proc/pid/fdinfo/fd for render processes that open the Panfrost DRM file. The pairs contain basic drm gpu engine and memory region information that can either be cat by a privileged user or accessed with IGT's gputop utility. Changelog: v1: https://lore.kernel.org/lkml/bb52b872-e41b-3894-285e-b52cfc849...@arm.com/T/ v2: https://lore.kernel.org/lkml/20230901084457.5bc1a...@collabora.com/T/ - Changed the way gpu cycles and engine time are calculated, using GPU registers and taking into account potential resets. - Split render engine values into fragment and vertex/tiler ones. - Added more fine-grained calculation of RSS size for BO's. - Implemente selection of drm-memory region size units - Removed locking of shrinker's mutex in GEM obj status function v3: https://lore.kernel.org/lkml/20230905184533.959171-1-adrian.laru...@collabora.com/ - Changed fdinfo engine names to something more descriptive - Mentioned GPU cycle counts aren't an exact measure - Handled the case when job->priv might be NULL - Handled 32 bit overflow of cycle register - Kept fdinfo drm memory stats size unit display within 10k times the previous multiplier for more accurate BO size numbers - Removed special handling of Prime imported BO RSS - Use rss_size only for heap objects - Use bo->base.madv instead of specific purgeable flag - Fixed kernel test robot warnings v4: - Move cycle counter get and put to panfrost_job_hw_submit and panfrost_job_handle_{err,done} for more accuracy - Make sure cycle counter refs are released in reset path - Drop the model param for toggling cycle counting and do leave it down to the debugfs file - Don't disable cycle counter when togglint debugfs file, let refcounting logic handle it instead. - Remove fdinfo data nested structure definion and 'names' field - When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume granuality of 2MiB for every successful mapping. - drm-file picks an fdinfo memory object size unit that doesn't lose precision. Adrián Larumbe (6): drm/panfrost: Add cycle count GPU register definitions drm/panfrost: Add fdinfo support GPU load metrics drm/panfrost: Add fdinfo support for memory stats drm/drm_file: Add DRM obj's RSS reporting function for fdinfo drm/panfrost: Implement generic DRM object RSS reporting function drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats drivers/gpu/drm/drm_file.c | 7 ++- drivers/gpu/drm/panfrost/Makefile | 2 + drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 +++ drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++ drivers/gpu/drm/panfrost/panfrost_device.c | 2 + drivers/gpu/drm/panfrost/panfrost_device.h | 13 + drivers/gpu/drm/panfrost/panfrost_drv.c | 59 - drivers/gpu/drm/panfrost/panfrost_gem.c | 29 ++ drivers/gpu/drm/panfrost/panfrost_gem.h | 5 ++ drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 6 +++ drivers/gpu/drm/panfrost/panfrost_job.c | 39 ++ drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++ drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + drivers/gpu/drm/panfrost/panfrost_regs.h| 5 ++ include/drm/drm_gem.h | 9 18 files changed, 264 insertions(+), 3 deletions(-) create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h base-commit: f45acf7acf75921c0409d452f0165f51a19a74fd -- 2.42.0
[PATCH v4 5/6] drm/panfrost: Implement generic DRM object RSS reporting function
BO's RSS is updated every time new pages are allocated on demand and mapped for the object at GPU page fault's IRQ handler, but only for heap buffers. The reason this is unnecessary for non-heap buffers is that they are mapped onto the GPU's VA space and backed by physical memory in their entirety at BO creation time. This calculation is unnecessary for imported PRIME objects, since heap buffers cannot be exported by our driver, and the actual BO RSS size is the one reported in its attached dmabuf structure. Signed-off-by: Adrián Larumbe --- drivers/gpu/drm/panfrost/panfrost_gem.c | 15 +++ drivers/gpu/drm/panfrost/panfrost_gem.h | 5 + drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + 3 files changed, 21 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 7d8f83d20539..4365434b48db 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -208,6 +208,20 @@ static enum drm_gem_object_status panfrost_gem_status(struct drm_gem_object *obj return res; } +static size_t panfrost_gem_rss(struct drm_gem_object *obj) +{ + struct panfrost_gem_object *bo = to_panfrost_bo(obj); + + if (bo->is_heap) { + return bo->heap_rss_size; + } else if (bo->base.pages) { + WARN_ON(bo->heap_rss_size); + return bo->base.base.size; + } else { + return 0; + } +} + static const struct drm_gem_object_funcs panfrost_gem_funcs = { .free = panfrost_gem_free_object, .open = panfrost_gem_open, @@ -220,6 +234,7 @@ static const struct drm_gem_object_funcs panfrost_gem_funcs = { .vunmap = drm_gem_shmem_object_vunmap, .mmap = drm_gem_shmem_object_mmap, .status = panfrost_gem_status, + .rss = panfrost_gem_rss, .vm_ops = _gem_shmem_vm_ops, }; diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h index ad2877eeeccd..13c0a8149c3a 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.h +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h @@ -36,6 +36,11 @@ struct panfrost_gem_object { */ atomic_t gpu_usecount; + /* +* Object chunk size currently mapped onto physical memory +*/ + size_t heap_rss_size; + bool noexec :1; bool is_heap:1; }; diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index d54d4e7b2195..7b1490cdaa48 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -522,6 +522,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, IOMMU_WRITE | IOMMU_READ | IOMMU_NOEXEC, sgt); bomapping->active = true; + bo->heap_rss_size += SZ_2; dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr); -- 2.42.0
[PATCH v4 1/6] drm/panfrost: Add cycle count GPU register definitions
These GPU registers will be used when programming the cycle counter, which we need for providing accurate fdinfo drm-cycles values to user space. Signed-off-by: Adrián Larumbe Reviewed-by: Boris Brezillon Reviewed-by: Steven Price --- drivers/gpu/drm/panfrost/panfrost_regs.h | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h index 919f44ac853d..55ec807550b3 100644 --- a/drivers/gpu/drm/panfrost/panfrost_regs.h +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h @@ -46,6 +46,8 @@ #define GPU_CMD_SOFT_RESET 0x01 #define GPU_CMD_PERFCNT_CLEAR0x03 #define GPU_CMD_PERFCNT_SAMPLE 0x04 +#define GPU_CMD_CYCLE_COUNT_START0x05 +#define GPU_CMD_CYCLE_COUNT_STOP 0x06 #define GPU_CMD_CLEAN_CACHES 0x07 #define GPU_CMD_CLEAN_INV_CACHES 0x08 #define GPU_STATUS 0x34 @@ -73,6 +75,9 @@ #define GPU_PRFCNT_TILER_EN0x74 #define GPU_PRFCNT_MMU_L2_EN 0x7c +#define GPU_CYCLE_COUNT_LO 0x90 +#define GPU_CYCLE_COUNT_HI 0x94 + #define GPU_THREAD_MAX_THREADS 0x0A0 /* (RO) Maximum number of threads per core */ #define GPU_THREAD_MAX_WORKGROUP_SIZE 0x0A4 /* (RO) Maximum workgroup size */ #define GPU_THREAD_MAX_BARRIER_SIZE0x0A8 /* (RO) Maximum threads waiting at a barrier */ -- 2.42.0
Re: [PATCH v3 8/8] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats
On 06.09.2023 10:11, Boris Brezillon wrote: >On Tue, 5 Sep 2023 19:45:24 +0100 >Adrián Larumbe wrote: > >> The current implementation will try to pick the highest available size >> display unit as soon as the BO size exceeds that of the previous >> multiplier. >> >> By selecting a higher threshold, we could show more accurate size numbers. >> >> Signed-off-by: Adrián Larumbe >> --- >> drivers/gpu/drm/drm_file.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c >> index 762965e3d503..0b5fbd493e05 100644 >> --- a/drivers/gpu/drm/drm_file.c >> +++ b/drivers/gpu/drm/drm_file.c >> @@ -879,7 +879,7 @@ static void print_size(struct drm_printer *p, const char >> *stat, >> unsigned u; >> >> for (u = 0; u < ARRAY_SIZE(units) - 1; u++) { >> -if (sz < SZ_1K) >> +if (sz < (SZ_1K * 1)) >> break; > >This threshold looks a bit random. How about picking a unit that allows >us to print the size with no precision loss? > > for (u = 0; u < ARRAY_SIZE(units) - 1; u++) { > if (sz & (SZ_1K - 1)) > break; > } In this case I picked up on Rob Clark's suggestion of choosing a hard limit of perhaps 10k or 100k times the current unit before moving on to the next one. While this approach guarantees that we don't lose precision, it would render a tad too long a number in KiB for BO's that aren't a multiple of a MiB. >> sz = div_u64(sz, SZ_1K); >> }