from:"Rob Clark"

Re: [Intel-gfx] [PATCH 5/5] drm/i915: Implement fdinfo memory stats printing

2023-09-20 Thread Rob Clark

On Wed, Sep 20, 2023 at 7:35 AM Tvrtko Ursulin
 wrote:
>
>
> On 24/08/2023 12:35, Upadhyay, Tejas wrote:
> >> -Original Message-
> >> From: Intel-gfx  On Behalf Of 
> >> Tvrtko
> >> Ursulin
> >> Sent: Friday, July 7, 2023 6:32 PM
> >> To: Intel-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org
> >> Subject: [Intel-gfx] [PATCH 5/5] drm/i915: Implement fdinfo memory stats
> >> printing
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> Use the newly added drm_print_memory_stats helper to show memory
> >> utilisation of our objects in drm/driver specific fdinfo output.
> >>
> >> To collect the stats we walk the per memory regions object lists and
> >> accumulate object size into the respective drm_memory_stats categories.
> >>
> >> Objects with multiple possible placements are reported in multiple regions 
> >> for
> >> total and shared sizes, while other categories are counted only for the
> >> currently active region.
> >>
> >> Signed-off-by: Tvrtko Ursulin 
> >> Cc: Aravind Iddamsetty 
> >> Cc: Rob Clark 
> >> ---
> >>   drivers/gpu/drm/i915/i915_drm_client.c | 85 ++
> >>   1 file changed, 85 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c
> >> b/drivers/gpu/drm/i915/i915_drm_client.c
> >> index ffccb6239789..5c77d6987d90 100644
> >> --- a/drivers/gpu/drm/i915/i915_drm_client.c
> >> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> >> @@ -45,6 +45,89 @@ void __i915_drm_client_free(struct kref *kref)  }
> >>
> >>   #ifdef CONFIG_PROC_FS
> >> +static void
> >> +obj_meminfo(struct drm_i915_gem_object *obj,
> >> +struct drm_memory_stats stats[INTEL_REGION_UNKNOWN]) {
> >> +struct intel_memory_region *mr;
> >> +u64 sz = obj->base.size;
> >> +enum intel_region_id id;
> >> +unsigned int i;
> >> +
> >> +/* Attribute size and shared to all possible memory regions. */
> >> +for (i = 0; i < obj->mm.n_placements; i++) {
> >> +mr = obj->mm.placements[i];
> >> +id = mr->id;
> >> +
> >> +if (obj->base.handle_count > 1)
> >> +stats[id].shared += sz;
> >> +else
> >> +stats[id].private += sz;
> >> +}
> >> +
> >> +/* Attribute other categories to only the current region. */
> >> +mr = obj->mm.region;
> >> +if (mr)
> >> +id = mr->id;
> >> +else
> >> +id = INTEL_REGION_SMEM;
> >> +
> >> +if (!obj->mm.n_placements) {
> >> +if (obj->base.handle_count > 1)
> >> +stats[id].shared += sz;
> >> +else
> >> +stats[id].private += sz;
> >> +}
> >> +
> >> +if (i915_gem_object_has_pages(obj)) {
> >> +stats[id].resident += sz;
> >> +
> >> +if (!dma_resv_test_signaled(obj->base.resv,
> >> +dma_resv_usage_rw(true)))
> >
> > Should not DMA_RESV_USAGE_BOOKKEEP also considered active (why only "rw")? 
> > Some app is syncing with syncjobs and has added dma_fence with 
> > DMA_RESV_USAGE_BOOKKEEP during execbuf while that BO is busy on waiting on 
> > work!
>
> Hmm do we have a path which adds DMA_RESV_USAGE_BOOKKEEP usage in execbuf?
>
> Rob, any comments here? Given how I basically lifted the logic from
> 686b21b5f6ca ("drm: Add fdinfo memory stats"), does it sound plausible
> to upgrade the test against all fences?

Yes, I think so.. I don't have any use for BOOKKEEP so I hadn't considered it

BR,
-R


>
> Regards,
>
> Tvrtko
>
> >> +stats[id].active += sz;
> >> +else if (i915_gem_object_is_shrinkable(obj) &&
> >> + obj->mm.madv == I915_MADV_DONTNEED)
> >> +stats[id].purgeable += sz;
> >> +}
> >> +}
> >> +
> >> +static void show_meminfo(struct drm_printer *p, struct drm_file *file)
> >> +{
> >> +struct drm_memory_stats stats[INTEL_REGION_UNKNOWN] = {};
> >> +struct drm_i915_file_private *fpriv = file->driver_priv;
> >> +struct i915_drm_client *client = fpriv

Re: [Intel-gfx] [PATCH v2] drm: Update file owner during use

2023-08-28 Thread Rob Clark

On Wed, Jun 21, 2023 at 2:48 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> With the typical model where the display server opens the file descriptor
> and then hands it over to the client(*), we were showing stale data in
> debugfs.
>
> Fix it by updating the drm_file->pid on ioctl access from a different
> process.
>
> The field is also made RCU protected to allow for lockless readers. Update
> side is protected with dev->filelist_mutex.
>
> Before:
>
> $ cat /sys/kernel/debug/dri/0/clients
>  command   pid dev master a   uid  magic
> Xorg  2344   0   yy 0  0
> Xorg  2344   0   ny 0  2
> Xorg  2344   0   ny 0  3
> Xorg  2344   0   ny 0  4
>
> After:
>
> $ cat /sys/kernel/debug/dri/0/clients
>  command  tgid dev master a   uid  magic
> Xorg   830   0   yy 0  0
>xfce4-session   880   0   ny 0  1
>xfwm4   943   0   ny 0  2
>neverball  1095   0   ny 0  3
>
> *)
> More detailed and historically accurate description of various handover
> implementation kindly provided by Emil Velikov:
>
> """
> The traditional model, the server was the orchestrator managing the
> primary device node. From the fd, to the master status and
> authentication. But looking at the fd alone, this has varied across
> the years.
>
> IIRC in the DRI1 days, Xorg (libdrm really) would have a list of open
> fd(s) and reuse those whenever needed, DRI2 the client was responsible
> for open() themselves and with DRI3 the fd was passed to the client.
>
> Around the inception of DRI3 and systemd-logind, the latter became
> another possible orchestrator. Whereby Xorg and Wayland compositors
> could ask it for the fd. For various reasons (hysterical and genuine
> ones) Xorg has a fallback path going the open(), whereas Wayland
> compositors are moving to solely relying on logind... some never had
> fallback even.
>
> Over the past few years, more projects have emerged which provide
> functionality similar (be that on API level, Dbus, or otherwise) to
> systemd-logind.
> """
>
> v2:
>  * Fixed typo in commit text and added a fine historical explanation
>from Emil.
>
> Signed-off-by: Tvrtko Ursulin 
> Cc: "Christian König" 
> Cc: Daniel Vetter 
> Acked-by: Christian König 
> Reviewed-by: Emil Velikov 

Reviewed-by: Rob Clark 
Tested-by: Rob Clark 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |  6 ++--
>  drivers/gpu/drm/drm_auth.c  |  3 +-
>  drivers/gpu/drm/drm_debugfs.c   | 10 ---
>  drivers/gpu/drm/drm_file.c  | 40 +++--
>  drivers/gpu/drm/drm_ioctl.c |  3 ++
>  drivers/gpu/drm/nouveau/nouveau_drm.c   |  5 +++-
>  drivers/gpu/drm/vmwgfx/vmwgfx_gem.c |  6 ++--
>  include/drm/drm_file.h  | 13 ++--
>  8 files changed, 71 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 74055cba3dc9..849097dff02b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -963,6 +963,7 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file 
> *m, void *unused)
> list_for_each_entry(file, >filelist, lhead) {
> struct task_struct *task;
> struct drm_gem_object *gobj;
> +   struct pid *pid;
> int id;
>
> /*
> @@ -972,8 +973,9 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file 
> *m, void *unused)
>  * Therefore, we need to protect this ->comm access using RCU.
>  */
> rcu_read_lock();
> -   task = pid_task(file->pid, PIDTYPE_TGID);
> -   seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid),
> +   pid = rcu_dereference(file->pid);
> +   task = pid_task(pid, PIDTYPE_TGID);
> +   seq_printf(m, "pid %8d command %s:\n", pid_nr(pid),
>task ? task->comm : "");
> rcu_read_unlock();
>
> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> index cf92a9ae8034..2ed2585ded37 100644
> --- a/drivers/gpu/drm/drm_auth.c
> +++ b/drivers/gpu/drm/drm_auth.c
> @@ -235,7 +235,8 @@ static int drm_new_set_master(struct drm_device *dev, 
> struct drm_file *fpriv)
>  static int
>  d

Re: [Intel-gfx] [PATCH 6/8] drm: Add drm_gem_prime_fd_to_handle_obj

2023-06-09 Thread Rob Clark

On Fri, Jun 9, 2023 at 7:12 AM Tvrtko Ursulin
 wrote:
>
>
> On 09/06/2023 13:44, Iddamsetty, Aravind wrote:
> > On 09-06-2023 17:41, Tvrtko Ursulin wrote:
> >> From: Tvrtko Ursulin 
> >>
> >> I need a new flavour of the drm_gem_prime_fd_to_handle helper, one which
> >> will return a reference to a newly created GEM objects (if created), in
> >> order to enable tracking of imported i915 GEM objects in the following
> >> patch.
> >
> > instead of this what if we implement the open call back in i915
> >
> > struct drm_gem_object_funcs {
> >
> >  /**
> >   * @open:
> >   *
> >   * Called upon GEM handle creation.
> >   *
> >   * This callback is optional.
> >   */
> >  int (*open)(struct drm_gem_object *obj, struct drm_file *file);
> >
> > which gets called whenever a handle(drm_gem_handle_create_tail) is
> > created and in the open we can check if to_intel_bo(obj)->base.dma_buf
> > then it is imported if not it is owned or created by it.
>
> I wanted to track as much memory usage as we have which is buffer object
> backed, including page tables and contexts. And since those are not user
> visible (they don't have handles), they wouldn't be covered by the
> obj->funcs->open() callback.
>
> If we wanted to limit to objects with handles we could simply do what
> Rob proposed and that is to walk the handles idr. But that does not feel
> like the right direction to me. Open for discussion I guess.

I guess you just have a few special case objects per context?
Wouldn't it be easier to just track _those_ specially and append them
to the results after doing the normal idr table walk?

(Also, doing something special for dma-buf smells a bit odd..
considering that we also have legacy flink name based sharing as
well.)

BR,
-R

Re: [Intel-gfx] [PATCH i-g-t 0/2] gputop/intel_gpu_top: Move name to be the last field

2023-05-15 Thread Rob Clark

On Mon, May 15, 2023 at 6:36 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> Rob,
>
> I thought maybe when you add memory stats the same field order like top(1)
> would feel more natural? That is client name comes last and is left justified.
> All other stats then come in the middle, between PID and NAME.
>
> DRM minor 0
> PID render copy video video-enhance  NAME
>2704 |▌   ||||||| kwin_x11
>2734 |▏   ||||||| plasmashell
>3932 |||||||| krunner
>4414 |||||||| 
> xdg-desktop-por
> 1999477 |||||||| firefox
> 2162094 |||||||| thunderbir

Seems like a good idea, and more in line with top/htop/nvtop

BR,
-R

> intel-gpu-top: Intel Alderlake_s (Gen12) @ /dev/dri/card0 -   15/  15 MHz
> 99% RC6;  0.01/ 5.46 W;   34 irqs/s
>
>  ENGINES BUSYMI_SEMA 
> MI_WAIT
>Render/3D1.31% |▌   |  0%  
> 0%
>  Blitter0.00% ||  0%  
> 0%
>Video0.00% ||  0%  
> 0%
> VideoEnhance0.00% ||  0%  
> 0%
>
> PID   Render/3D  BlitterVideo  VideoEnhance  NAME
>2734 |▏   ||||||| plasmashell
>2704 |▏   ||||||| kwin_x11
>1837 |▏   ||||||| Xorg
> 3429732 |||||    ||    | kwrite
> 2162094 |||||||| thunderbird
>
> Cc: Rob Clark 
>
> Tvrtko Ursulin (2):
>   gputop: Move client name last
>   intel_gpu_top: Move client name last
>
>  tools/gputop.c| 19 +--
>  tools/intel_gpu_top.c | 19 +--
>  2 files changed, 18 insertions(+), 20 deletions(-)
>
> --
> 2.37.2
>

Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 8/8] gputop: Basic vendor agnostic GPU top tool

2023-05-12 Thread Rob Clark

On Thu, Apr 6, 2023 at 7:33 AM Tvrtko Ursulin
 wrote:
>
>
> On 06/04/2023 15:21, Rob Clark wrote:
> > On Thu, Apr 6, 2023 at 4:08 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 05/04/2023 18:57, Rob Clark wrote:
> >>> On Tue, Jan 31, 2023 at 3:33 AM Tvrtko Ursulin
> >>>  wrote:
> >>>>
> >>>> From: Tvrtko Ursulin 
> >>>>
> >>>> Rudimentary vendor agnostic example of how lib_igt_drm_clients can be 
> >>>> used
> >>>> to display a sorted by card and usage list of processes using GPUs.
> >>>>
> >>>> Borrows a bit of code from intel_gpu_top but for now omits the fancy
> >>>> features like interactive functionality, card selection, client
> >>>> aggregation, sort modes, JSON output  and pretty engine names. Also no
> >>>> support for global GPU or system metrics.
> >>>>
> >>>> On the other hand it shows clients from all DRM cards which
> >>>> intel_gpu_top does not do.
> >>>>
> >>>> Signed-off-by: Tvrtko Ursulin 
> >>>> Cc: Rob Clark 
> >>>> Cc: Christian König 
> >>>> Acked-by: Christian König 
> >>>
> >>> Reviewed-by: Rob Clark 
> >>
> >> Presumably for 8/8 only?
> >>
> >> The rest of the series does not apply any more by now. I need to rebase..
> >
> > I didn't look closely at the rest of the series (was kinda assuming
> > that was mostly just moving things around).. but I see you rebased it
> > so I can take a look.
>
> There's a lot in there - first patch is extracting some code into a
> library, with the corresponding renames, but then there are six patches
> of tweaks and feature additions which finally make gputop possible.
>
> Hopefully you can penetrate the concepts. It was all at least Valgrind
> clean back in the day I first did it.
>

by now I've read (and rebased locally) the series, and even added a
couple things on top.. so r-b for the series, we should get this
landed

BR,
-R

[Intel-gfx] [PATCH] drm/syncobj: Add deadline support for syncobj waits

2023-04-26 Thread Rob Clark

From: Rob Clark 

Add a new flag to let userspace provide a deadline as a hint for syncobj
and timeline waits.  This gives a hint to the driver signaling the
backing fences about how soon userspace needs it to compete work, so it
can addjust GPU frequency accordingly.  An immediate deadline can be
given to provide something equivalent to i915 "wait boost".

v2: Use absolute u64 ns value for deadline hint, drop cap and driver
feature flag in favor of allowing count_handles==0 as a way for
userspace to probe kernel for support of new flag
v3: More verbose comments about UAPI
v4: Fix negative zero, s/deadline_ns/deadline_nsec/ for consistency with
existing ioctl struct fields

Signed-off-by: Rob Clark 
---
Discussion on mesa MR 
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2197)
seems to have died down.  So resending this as a singlton.  This version
has some minor cosmetic cleanups compared to the previous iteration.

There are some other remaining UABI bits, waiting for someone to have a
chance to implement userspace for, such as sync_file SET_DEADLINE ioctl,
which can be found: https://patchwork.freedesktop.org/series/93035/

 drivers/gpu/drm/drm_syncobj.c | 64 ---
 include/uapi/drm/drm.h| 17 ++
 2 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 0c2be8360525..3f86e2b84200 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -126,6 +126,11 @@
  * synchronize between the two.
  * This requirement is inherited from the Vulkan fence API.
  *
+ * If _SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE is set, the ioctl will also set
+ * a fence deadline hint on the backing fences before waiting, to provide the
+ * fence signaler with an appropriate sense of urgency.  The deadline is
+ * specified as an absolute _MONOTONIC value in units of ns.
+ *
  * Similarly, _IOCTL_SYNCOBJ_TIMELINE_WAIT takes an array of syncobj
  * handles as well as an array of u64 points and does a host-side wait on all
  * of syncobj fences at the given points simultaneously.
@@ -973,7 +978,8 @@ static signed long drm_syncobj_array_wait_timeout(struct 
drm_syncobj **syncobjs,
  uint32_t count,
  uint32_t flags,
  signed long timeout,
- uint32_t *idx)
+ uint32_t *idx,
+ ktime_t *deadline)
 {
struct syncobj_wait_entry *entries;
struct dma_fence *fence;
@@ -1053,6 +1059,15 @@ static signed long drm_syncobj_array_wait_timeout(struct 
drm_syncobj **syncobjs,
drm_syncobj_fence_add_wait(syncobjs[i], [i]);
}
 
+   if (deadline) {
+   for (i = 0; i < count; ++i) {
+   fence = entries[i].fence;
+   if (!fence)
+   continue;
+   dma_fence_set_deadline(fence, *deadline);
+   }
+   }
+
do {
set_current_state(TASK_INTERRUPTIBLE);
 
@@ -1151,7 +1166,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
  struct drm_file *file_private,
  struct drm_syncobj_wait *wait,
  struct drm_syncobj_timeline_wait 
*timeline_wait,
- struct drm_syncobj **syncobjs, bool timeline)
+ struct drm_syncobj **syncobjs, bool timeline,
+ ktime_t *deadline)
 {
signed long timeout = 0;
uint32_t first = ~0;
@@ -1162,7 +1178,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
 NULL,
 wait->count_handles,
 wait->flags,
-timeout, );
+timeout, ,
+deadline);
if (timeout < 0)
return timeout;
wait->first_signaled = first;
@@ -1172,7 +1189,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
 
u64_to_user_ptr(timeline_wait->points),
 
timeline_wait->count_handles,
 timeline_wait->flags,
-timeout, );
+

Re: [Intel-gfx] [RFC 6/6] drm/i915: Implement fdinfo memory stats printing

2023-04-20 Thread Rob Clark

On Thu, Apr 20, 2023 at 6:11 AM Tvrtko Ursulin
 wrote:
>
>
> On 19/04/2023 15:38, Rob Clark wrote:
> > On Wed, Apr 19, 2023 at 7:06 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 18/04/2023 17:08, Rob Clark wrote:
> >>> On Tue, Apr 18, 2023 at 7:58 AM Tvrtko Ursulin
> >>>  wrote:
> >>>> On 18/04/2023 15:39, Rob Clark wrote:
> >>>>> On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
> >>>>>  wrote:
> >>>>>>
> >>>>>> From: Tvrtko Ursulin 
> >>>>>>
> >>>>>> Show how more driver specific set of memory stats could be shown,
> >>>>>> more specifically where object can reside in multiple regions, showing 
> >>>>>> all
> >>>>>> the supported stats, and where there is more to show than just user 
> >>>>>> visible
> >>>>>> objects.
> >>>>>>
> >>>>>> WIP...
> >>>>>>
> >>>>>> Signed-off-by: Tvrtko Ursulin 
> >>>>>> ---
> >>>>>> drivers/gpu/drm/i915/i915_driver.c |   5 ++
> >>>>>> drivers/gpu/drm/i915/i915_drm_client.c | 102 
> >>>>>> +
> >>>>>> drivers/gpu/drm/i915/i915_drm_client.h |   8 ++
> >>>>>> drivers/gpu/drm/i915/i915_drv.h|   2 +
> >>>>>> 4 files changed, 117 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/i915/i915_driver.c 
> >>>>>> b/drivers/gpu/drm/i915/i915_driver.c
> >>>>>> index 6493548c69bf..4c70206cbc27 100644
> >>>>>> --- a/drivers/gpu/drm/i915/i915_driver.c
> >>>>>> +++ b/drivers/gpu/drm/i915/i915_driver.c
> >>>>>> @@ -1806,6 +1806,11 @@ static const struct drm_driver i915_drm_driver 
> >>>>>> = {
> >>>>>>.dumb_create = i915_gem_dumb_create,
> >>>>>>.dumb_map_offset = i915_gem_dumb_mmap_offset,
> >>>>>>
> >>>>>> +#ifdef CONFIG_PROC_FS
> >>>>>> +   .query_fdinfo_memory_regions = 
> >>>>>> i915_query_fdinfo_memory_regions,
> >>>>>> +   .query_fdinfo_memory_stats = i915_query_fdinfo_memory_stats,
> >>>>>> +#endif
> >>>>>> +
> >>>>>>.ioctls = i915_ioctls,
> >>>>>>.num_ioctls = ARRAY_SIZE(i915_ioctls),
> >>>>>>.fops = _driver_fops,
> >>>>>> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
> >>>>>> b/drivers/gpu/drm/i915/i915_drm_client.c
> >>>>>> index c654984189f7..65857c68bdb3 100644
> >>>>>> --- a/drivers/gpu/drm/i915/i915_drm_client.c
> >>>>>> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> >>>>>> @@ -12,6 +12,7 @@
> >>>>>> #include 
> >>>>>>
> >>>>>> #include "gem/i915_gem_context.h"
> >>>>>> +#include "intel_memory_region.h"
> >>>>>> #include "i915_drm_client.h"
> >>>>>> #include "i915_file_private.h"
> >>>>>> #include "i915_gem.h"
> >>>>>> @@ -112,4 +113,105 @@ void i915_drm_client_fdinfo(struct drm_printer 
> >>>>>> *p, struct drm_file *file)
> >>>>>>for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
> >>>>>>show_client_class(p, i915, file_priv->client, i);
> >>>>>> }
> >>>>>> +
> >>>>>> +char **
> >>>>>> +i915_query_fdinfo_memory_regions(struct drm_device *dev, unsigned int 
> >>>>>> *num)
> >>>>>> +{
> >>>>>> +   struct drm_i915_private *i915 = to_i915(dev);
> >>>>>> +   struct intel_memory_region *mr;
> >>>>>> +   enum intel_region_id id;
> >>>>>> +
> >>>>>> +   /* FIXME move to init */
> >>>>>> +   for_each_memory_region(mr, i915, id) {
> >>>>>> +   if (!i915->mm.region_names[id])
> >>>>>> +   i915->mm.re

Re: [Intel-gfx] [RFC 4/6] drm: Add simple fdinfo memory helpers

2023-04-20 Thread Rob Clark

On Thu, Apr 20, 2023 at 6:14 AM Tvrtko Ursulin
 wrote:
>
>
> On 19/04/2023 15:32, Rob Clark wrote:
> > On Wed, Apr 19, 2023 at 6:16 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 18/04/2023 18:18, Rob Clark wrote:
> >>> On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
> >>>  wrote:
> >>>>
> >>>> From: Tvrtko Ursulin 
> >>>>
> >>>> For drivers who only wish to show one memory region called 'system,
> >>>> and only account the GEM buffer object handles under it.
> >>>>
> >>>> Signed-off-by: Tvrtko Ursulin 
> >>>> ---
> >>>>drivers/gpu/drm/drm_file.c | 45 ++
> >>>>include/drm/drm_file.h |  6 +
> >>>>2 files changed, 51 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> >>>> index e202f79e816d..1e70669dddf7 100644
> >>>> --- a/drivers/gpu/drm/drm_file.c
> >>>> +++ b/drivers/gpu/drm/drm_file.c
> >>>> @@ -872,6 +872,51 @@ void drm_send_event(struct drm_device *dev, struct 
> >>>> drm_pending_event *e)
> >>>>}
> >>>>EXPORT_SYMBOL(drm_send_event);
> >>>>
> >>>> +static void
> >>>> +add_obj(struct drm_gem_object *obj, struct drm_fdinfo_memory_stat 
> >>>> *stats)
> >>>> +{
> >>>> +   u64 sz = obj->size;
> >>>> +
> >>>> +   stats[0].size += sz;
> >>>> +
> >>>> +   if (obj->handle_count > 1)
> >>>> +   stats[0].shared += sz;
> >>>> +
> >>>> +   if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true)))
> >>>> +   stats[0].active += sz;
> >>>> +
> >>>> +   /* Not supported. */
> >>>> +   stats[0].resident = ~0ull;
> >>>> +   stats[0].purgeable = ~0ull;
> >>>
> >>> Hmm, this kinda makes the simple helper not very useful.  In my
> >>> version, you get something that is useful for all UMA drivers (which
> >>> all, IIRC, have some form of madv ioctl).  I was kinda imagining that
> >>> for ttm drivers, my print_memory_stats() would just become a helper
> >>> and that TTM (or "multi-region") drivers would have their own thing.
> >>
> >> Hm how? Your version also needed a driver specific vfunc:
> >
> > Sure, but there is a possibility for a driver specific vfunc ;-)
>
> Indeed, they are there in both proposals! :)
>
> > And arguably we could move madv to drm_gem_object, along with get/put
> > pages tracking.. since all the UMA drivers pretty much do the same
> > thing.  So maybe the vfunc is a temporary thing.
> >
> > Anyways, I could go back to my original version where it was a helper
> > called from the driver to print mem stats.  That way, TTM drivers
> > could do their own thing.
>
> I must have missed that. So it was the same idea as in my RFC?

similar, danvet asked for something "more core" ;-)

https://patchwork.freedesktop.org/patch/531403/?series=116216=1

BR,
-R

> Regards,
>
> Tvrtko
>
> > BR,
> > -R
> >
> >> +static enum drm_gem_object_status msm_gem_status(struct drm_gem_object 
> >> *obj)
> >> +{
> >> +   struct msm_gem_object *msm_obj = to_msm_bo(obj);
> >> +   enum drm_gem_object_status status = 0;
> >> +
> >> +   if (msm_obj->pages)
> >> +   status |= DRM_GEM_OBJECT_RESIDENT;
> >> +
> >> +   if (msm_obj->madv == MSM_MADV_DONTNEED)
> >> +   status |= DRM_GEM_OBJECT_PURGEABLE;
> >> +
> >> +   return status;
> >> +}
> >>
> >> Regards,
> >>
> >> Tvrtko
> >>
> >>>
> >>> BR,
> >>> -R
> >>>
> >>>> +}
> >>>> +
> >>>> +char **
> >>>> +drm_query_fdinfo_system_region(struct drm_device *dev, unsigned int 
> >>>> *num)
> >>>> +{
> >>>> +   static char *region[] = {
> >>>> +   "system",
> >>>> +   };
> >>>> +
> >>>> +   *num = 1;
> >>>> +
> >>>> +   ret

Re: [Intel-gfx] [RFC 6/6] drm/i915: Implement fdinfo memory stats printing

2023-04-19 Thread Rob Clark

On Wed, Apr 19, 2023 at 7:06 AM Tvrtko Ursulin
 wrote:
>
>
> On 18/04/2023 17:08, Rob Clark wrote:
> > On Tue, Apr 18, 2023 at 7:58 AM Tvrtko Ursulin
> >  wrote:
> >> On 18/04/2023 15:39, Rob Clark wrote:
> >>> On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
> >>>  wrote:
> >>>>
> >>>> From: Tvrtko Ursulin 
> >>>>
> >>>> Show how more driver specific set of memory stats could be shown,
> >>>> more specifically where object can reside in multiple regions, showing 
> >>>> all
> >>>> the supported stats, and where there is more to show than just user 
> >>>> visible
> >>>> objects.
> >>>>
> >>>> WIP...
> >>>>
> >>>> Signed-off-by: Tvrtko Ursulin 
> >>>> ---
> >>>>drivers/gpu/drm/i915/i915_driver.c |   5 ++
> >>>>drivers/gpu/drm/i915/i915_drm_client.c | 102 +
> >>>>drivers/gpu/drm/i915/i915_drm_client.h |   8 ++
> >>>>drivers/gpu/drm/i915/i915_drv.h|   2 +
> >>>>4 files changed, 117 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/i915_driver.c 
> >>>> b/drivers/gpu/drm/i915/i915_driver.c
> >>>> index 6493548c69bf..4c70206cbc27 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_driver.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_driver.c
> >>>> @@ -1806,6 +1806,11 @@ static const struct drm_driver i915_drm_driver = {
> >>>>   .dumb_create = i915_gem_dumb_create,
> >>>>   .dumb_map_offset = i915_gem_dumb_mmap_offset,
> >>>>
> >>>> +#ifdef CONFIG_PROC_FS
> >>>> +   .query_fdinfo_memory_regions = i915_query_fdinfo_memory_regions,
> >>>> +   .query_fdinfo_memory_stats = i915_query_fdinfo_memory_stats,
> >>>> +#endif
> >>>> +
> >>>>   .ioctls = i915_ioctls,
> >>>>   .num_ioctls = ARRAY_SIZE(i915_ioctls),
> >>>>   .fops = _driver_fops,
> >>>> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
> >>>> b/drivers/gpu/drm/i915/i915_drm_client.c
> >>>> index c654984189f7..65857c68bdb3 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_drm_client.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> >>>> @@ -12,6 +12,7 @@
> >>>>#include 
> >>>>
> >>>>#include "gem/i915_gem_context.h"
> >>>> +#include "intel_memory_region.h"
> >>>>#include "i915_drm_client.h"
> >>>>#include "i915_file_private.h"
> >>>>#include "i915_gem.h"
> >>>> @@ -112,4 +113,105 @@ void i915_drm_client_fdinfo(struct drm_printer *p, 
> >>>> struct drm_file *file)
> >>>>   for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
> >>>>   show_client_class(p, i915, file_priv->client, i);
> >>>>}
> >>>> +
> >>>> +char **
> >>>> +i915_query_fdinfo_memory_regions(struct drm_device *dev, unsigned int 
> >>>> *num)
> >>>> +{
> >>>> +   struct drm_i915_private *i915 = to_i915(dev);
> >>>> +   struct intel_memory_region *mr;
> >>>> +   enum intel_region_id id;
> >>>> +
> >>>> +   /* FIXME move to init */
> >>>> +   for_each_memory_region(mr, i915, id) {
> >>>> +   if (!i915->mm.region_names[id])
> >>>> +   i915->mm.region_names[id] = mr->name;
> >>>> +   }
> >>>> +
> >>>> +   *num = id;
> >>>> +
> >>>> +   return i915->mm.region_names;
> >>>> +}
> >>>> +
> >>>> +static void
> >>>> +add_obj(struct drm_i915_gem_object *obj, struct drm_fdinfo_memory_stat 
> >>>> *stats)
> >>>> +{
> >>>> +struct intel_memory_region *mr;
> >>>> +   u64 sz = obj->base.size;
> >>>> +enum intel_region_id id;
> >>>> +   unsigned int i;
> >>>> +
> >>>> +   if (!obj)
> >>>> +   return;
> >>>> +
> >>>> +   /* Attribute size

Re: [Intel-gfx] [RFC 4/6] drm: Add simple fdinfo memory helpers

2023-04-19 Thread Rob Clark

On Wed, Apr 19, 2023 at 6:16 AM Tvrtko Ursulin
 wrote:
>
>
> On 18/04/2023 18:18, Rob Clark wrote:
> > On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> For drivers who only wish to show one memory region called 'system,
> >> and only account the GEM buffer object handles under it.
> >>
> >> Signed-off-by: Tvrtko Ursulin 
> >> ---
> >>   drivers/gpu/drm/drm_file.c | 45 ++
> >>   include/drm/drm_file.h |  6 +
> >>   2 files changed, 51 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> >> index e202f79e816d..1e70669dddf7 100644
> >> --- a/drivers/gpu/drm/drm_file.c
> >> +++ b/drivers/gpu/drm/drm_file.c
> >> @@ -872,6 +872,51 @@ void drm_send_event(struct drm_device *dev, struct 
> >> drm_pending_event *e)
> >>   }
> >>   EXPORT_SYMBOL(drm_send_event);
> >>
> >> +static void
> >> +add_obj(struct drm_gem_object *obj, struct drm_fdinfo_memory_stat *stats)
> >> +{
> >> +   u64 sz = obj->size;
> >> +
> >> +   stats[0].size += sz;
> >> +
> >> +   if (obj->handle_count > 1)
> >> +   stats[0].shared += sz;
> >> +
> >> +   if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true)))
> >> +   stats[0].active += sz;
> >> +
> >> +   /* Not supported. */
> >> +   stats[0].resident = ~0ull;
> >> +   stats[0].purgeable = ~0ull;
> >
> > Hmm, this kinda makes the simple helper not very useful.  In my
> > version, you get something that is useful for all UMA drivers (which
> > all, IIRC, have some form of madv ioctl).  I was kinda imagining that
> > for ttm drivers, my print_memory_stats() would just become a helper
> > and that TTM (or "multi-region") drivers would have their own thing.
>
> Hm how? Your version also needed a driver specific vfunc:

Sure, but there is a possibility for a driver specific vfunc ;-)

And arguably we could move madv to drm_gem_object, along with get/put
pages tracking.. since all the UMA drivers pretty much do the same
thing.  So maybe the vfunc is a temporary thing.

Anyways, I could go back to my original version where it was a helper
called from the driver to print mem stats.  That way, TTM drivers
could do their own thing.

BR,
-R

> +static enum drm_gem_object_status msm_gem_status(struct drm_gem_object *obj)
> +{
> +   struct msm_gem_object *msm_obj = to_msm_bo(obj);
> +   enum drm_gem_object_status status = 0;
> +
> +   if (msm_obj->pages)
> +   status |= DRM_GEM_OBJECT_RESIDENT;
> +
> +   if (msm_obj->madv == MSM_MADV_DONTNEED)
> +   status |= DRM_GEM_OBJECT_PURGEABLE;
> +
> +   return status;
> +}
>
> Regards,
>
> Tvrtko
>
> >
> > BR,
> > -R
> >
> >> +}
> >> +
> >> +char **
> >> +drm_query_fdinfo_system_region(struct drm_device *dev, unsigned int *num)
> >> +{
> >> +   static char *region[] = {
> >> +   "system",
> >> +   };
> >> +
> >> +   *num = 1;
> >> +
> >> +   return region;
> >> +}
> >> +EXPORT_SYMBOL(drm_query_fdinfo_system_region);
> >> +
> >> +void
> >> +drm_query_fdinfo_system_memory(struct drm_file *file,
> >> +  struct drm_fdinfo_memory_stat *stats)
> >> +{
> >> +   struct drm_gem_object *obj;
> >> +   int id;
> >> +
> >> +   spin_lock(>table_lock);
> >> +   idr_for_each_entry(>object_idr, obj, id)
> >> +   add_obj(obj, stats);
> >> +   spin_unlock(>table_lock);
> >> +}
> >> +EXPORT_SYMBOL(drm_query_fdinfo_system_memory);
> >> +
> >>   static void
> >>   print_stat(struct drm_printer *p, const char *stat, const char *region, 
> >> u64 sz)
> >>   {
> >> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> >> index 00d48beeac5c..dd7c6fb2c975 100644
> >> --- a/include/drm/drm_file.h
> >> +++ b/include/drm/drm_file.h
> >> @@ -383,6 +383,12 @@ struct drm_fdinfo_memory_stat {
> >>  u64 active;
> >>   };
> >>
> >> +char **drm_query_fdinfo_system_region(struct drm_device *dev,
> >> + unsigned int *num);
> >> +void drm_query_fdinfo_system_memory(struct drm_file *file,
> >> +   struct drm_fdinfo_memory_stat *stats);
> >> +
> >> +
> >>   /**
> >>* drm_is_primary_client - is this an open file of the primary node
> >>* @file_priv: DRM file
> >> --
> >> 2.37.2
> >>

Re: [Intel-gfx] [RFC 4/6] drm: Add simple fdinfo memory helpers

2023-04-18 Thread Rob Clark

On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> For drivers who only wish to show one memory region called 'system,
> and only account the GEM buffer object handles under it.
>
> Signed-off-by: Tvrtko Ursulin 
> ---
>  drivers/gpu/drm/drm_file.c | 45 ++
>  include/drm/drm_file.h |  6 +
>  2 files changed, 51 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index e202f79e816d..1e70669dddf7 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -872,6 +872,51 @@ void drm_send_event(struct drm_device *dev, struct 
> drm_pending_event *e)
>  }
>  EXPORT_SYMBOL(drm_send_event);
>
> +static void
> +add_obj(struct drm_gem_object *obj, struct drm_fdinfo_memory_stat *stats)
> +{
> +   u64 sz = obj->size;
> +
> +   stats[0].size += sz;
> +
> +   if (obj->handle_count > 1)
> +   stats[0].shared += sz;
> +
> +   if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true)))
> +   stats[0].active += sz;
> +
> +   /* Not supported. */
> +   stats[0].resident = ~0ull;
> +   stats[0].purgeable = ~0ull;

Hmm, this kinda makes the simple helper not very useful.  In my
version, you get something that is useful for all UMA drivers (which
all, IIRC, have some form of madv ioctl).  I was kinda imagining that
for ttm drivers, my print_memory_stats() would just become a helper
and that TTM (or "multi-region") drivers would have their own thing.

BR,
-R

> +}
> +
> +char **
> +drm_query_fdinfo_system_region(struct drm_device *dev, unsigned int *num)
> +{
> +   static char *region[] = {
> +   "system",
> +   };
> +
> +   *num = 1;
> +
> +   return region;
> +}
> +EXPORT_SYMBOL(drm_query_fdinfo_system_region);
> +
> +void
> +drm_query_fdinfo_system_memory(struct drm_file *file,
> +  struct drm_fdinfo_memory_stat *stats)
> +{
> +   struct drm_gem_object *obj;
> +   int id;
> +
> +   spin_lock(>table_lock);
> +   idr_for_each_entry(>object_idr, obj, id)
> +   add_obj(obj, stats);
> +   spin_unlock(>table_lock);
> +}
> +EXPORT_SYMBOL(drm_query_fdinfo_system_memory);
> +
>  static void
>  print_stat(struct drm_printer *p, const char *stat, const char *region, u64 
> sz)
>  {
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 00d48beeac5c..dd7c6fb2c975 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -383,6 +383,12 @@ struct drm_fdinfo_memory_stat {
> u64 active;
>  };
>
> +char **drm_query_fdinfo_system_region(struct drm_device *dev,
> + unsigned int *num);
> +void drm_query_fdinfo_system_memory(struct drm_file *file,
> +   struct drm_fdinfo_memory_stat *stats);
> +
> +
>  /**
>   * drm_is_primary_client - is this an open file of the primary node
>   * @file_priv: DRM file
> --
> 2.37.2
>

Re: [Intel-gfx] [RFC 3/6] drm: Add fdinfo memory stats

2023-04-18 Thread Rob Clark

On Tue, Apr 18, 2023 at 9:44 AM Tvrtko Ursulin
 wrote:
>
>
> On 18/04/2023 17:13, Rob Clark wrote:
> > On Tue, Apr 18, 2023 at 7:46 AM Tvrtko Ursulin
> >  wrote:
> >> On 18/04/2023 15:36, Rob Clark wrote:
> >>> On Tue, Apr 18, 2023 at 7:19 AM Tvrtko Ursulin
> >>>  wrote:
> >>>>
> >>>>
> >>>> On 18/04/2023 14:49, Rob Clark wrote:
> >>>>> On Tue, Apr 18, 2023 at 2:00 AM Tvrtko Ursulin
> >>>>>  wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 17/04/2023 20:39, Rob Clark wrote:
> >>>>>>> On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>> From: Tvrtko Ursulin 
> >>>>>>>>
> >>>>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Tvrtko Ursulin 
> >>>>>>>> ---
> >>>>>>>>  Documentation/gpu/drm-usage-stats.rst | 12 +++
> >>>>>>>>  drivers/gpu/drm/drm_file.c| 52 
> >>>>>>>> +++
> >>>>>>>>  include/drm/drm_drv.h |  7 
> >>>>>>>>  include/drm/drm_file.h|  8 +
> >>>>>>>>  4 files changed, 79 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst 
> >>>>>>>> b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>> index 2ab32c40e93c..8273a41b2fb0 100644
> >>>>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>>>> @@ -21,6 +21,7 @@ File format specification
> >>>>>>>>
> >>>>>>>>  - File shall contain one key value pair per one line of text.
> >>>>>>>>  - Colon character (`:`) must be used to delimit keys and values.
> >>>>>>>> +- Caret (`^`) is also a reserved character.
> >>>>>>>
> >>>>>>> this doesn't solve the problem that led me to drm-$CATEGORY-memory... 
> >>>>>>> ;-)
> >>>>>>
> >>>>>> Could you explain or remind me with a link to a previous explanation?
> >>>>>
> >>>>> How is userspace supposed to know that "drm-memory-foo" is a memory
> >>>>> type "foo" but drm-memory-foo^size is not memory type "foo^size"?
> >>>>
> >>>> Are you referring to nvtop?
> >>>
> >>> I'm not referring to any particular app.  It could even be some app
> >>> that isn't even written yet but started with an already existing
> >>> kernel without this change.  It is just a general point about forwards
> >>> compatibility of old userspace with new kernel.  And it doesn't really
> >>> matter what special character you use.  You can't retroactively define
> >>> some newly special characters.
> >>
> >> What you see does not work if we output both legacy and new key with
> >> extra category? Userspace which hardcode the name keep working, and
> >> userspace which parses region names as opaque strings also keeps working
> >> just shows more entries.
> >
> > well, it shows nonsense entries.. I'd not call that "working"
> >
> > But honestly we are wasting too many words on this.. we just can't
> > re-use the "drm-memory-" namespace, it is already burnt.
> > Full stop.
> >
> > If you don't like the "drm-$CATEGORY-$REGION" workaround then we can
> > shorten to "drm-mem-$REGION-$CATEGORY" since that won't accidentally
> > match the existing "drm-memory-" pattern.
>
> I can live with that token reversal, it was not the primary motivation
> for my RFC as we have discussed that side of things already before I
> sketched my version up.
>
> But I also still don't get what doesn't work with what I described and
> you did not really address my specific questions with more than a
> "doesn't work" with not much details.
>
> Unless for you it starts and ends with "nonsense entries". If so then it
> seems there is no option than to disagree and move on. Again, I can
> accept the drm-$category-memory-$region.

Yeah, it's about "nonsense entries".. and I am perhaps being a bit
pedantic about compatibility with old userspace (not like there are
100's of apps parsing drm fdinfo), but it is the principle of the
matter ;-)

BR,
-R

Re: [Intel-gfx] [RFC 3/6] drm: Add fdinfo memory stats

2023-04-18 Thread Rob Clark

On Tue, Apr 18, 2023 at 7:46 AM Tvrtko Ursulin
 wrote:
>
>
> On 18/04/2023 15:36, Rob Clark wrote:
> > On Tue, Apr 18, 2023 at 7:19 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 18/04/2023 14:49, Rob Clark wrote:
> >>> On Tue, Apr 18, 2023 at 2:00 AM Tvrtko Ursulin
> >>>  wrote:
> >>>>
> >>>>
> >>>> On 17/04/2023 20:39, Rob Clark wrote:
> >>>>> On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
> >>>>>  wrote:
> >>>>>>
> >>>>>> From: Tvrtko Ursulin 
> >>>>>>
> >>>>>> Add support to dump GEM stats to fdinfo.
> >>>>>>
> >>>>>> Signed-off-by: Tvrtko Ursulin 
> >>>>>> ---
> >>>>>> Documentation/gpu/drm-usage-stats.rst | 12 +++
> >>>>>> drivers/gpu/drm/drm_file.c| 52 
> >>>>>> +++
> >>>>>> include/drm/drm_drv.h |  7 
> >>>>>> include/drm/drm_file.h|  8 +
> >>>>>> 4 files changed, 79 insertions(+)
> >>>>>>
> >>>>>> diff --git a/Documentation/gpu/drm-usage-stats.rst 
> >>>>>> b/Documentation/gpu/drm-usage-stats.rst
> >>>>>> index 2ab32c40e93c..8273a41b2fb0 100644
> >>>>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>>>> @@ -21,6 +21,7 @@ File format specification
> >>>>>>
> >>>>>> - File shall contain one key value pair per one line of text.
> >>>>>> - Colon character (`:`) must be used to delimit keys and values.
> >>>>>> +- Caret (`^`) is also a reserved character.
> >>>>>
> >>>>> this doesn't solve the problem that led me to drm-$CATEGORY-memory... 
> >>>>> ;-)
> >>>>
> >>>> Could you explain or remind me with a link to a previous explanation?
> >>>
> >>> How is userspace supposed to know that "drm-memory-foo" is a memory
> >>> type "foo" but drm-memory-foo^size is not memory type "foo^size"?
> >>
> >> Are you referring to nvtop?
> >
> > I'm not referring to any particular app.  It could even be some app
> > that isn't even written yet but started with an already existing
> > kernel without this change.  It is just a general point about forwards
> > compatibility of old userspace with new kernel.  And it doesn't really
> > matter what special character you use.  You can't retroactively define
> > some newly special characters.
>
> What you see does not work if we output both legacy and new key with
> extra category? Userspace which hardcode the name keep working, and
> userspace which parses region names as opaque strings also keeps working
> just shows more entries.

well, it shows nonsense entries.. I'd not call that "working"

But honestly we are wasting too many words on this.. we just can't
re-use the "drm-memory-" namespace, it is already burnt.
Full stop.

If you don't like the "drm-$CATEGORY-$REGION" workaround then we can
shorten to "drm-mem-$REGION-$CATEGORY" since that won't accidentally
match the existing "drm-memory-" pattern.

BR,
-R

> Regards,
>
> Tvrtko
>
> >
> > BR,
> > -R
> >
> >> Indeed that one hardcodes:
> >>
> >> static const char drm_amdgpu_vram[] = "drm-memory-vram";
> >>
> >> And does brute strcmp on it:
> >>
> >> } else if (!strcmp(key, drm_amdgpu_vram_old) || !strcmp(key, 
> >> drm_amdgpu_vram)) {
> >>
> >> So okay for that one, if we need to keep it working I just change this in 
> >> my RFC:
> >>
> >> """
> >> Resident category is identical to the drm-memory- key and two should 
> >> be
> >> mutually exclusive.
> >> """
> >>
> >> Into this:
> >>
> >> """
> >> Resident category is identical to the drm-memory- key and where the 
> >> categorized one is present the legacy one must be too in order to preserve 
> >> backward compatibility.
> >> """
> >>
> >> Addition to my RFC:
> >>
> >> ...
> >>  for (i = 0; i < num; i++)

Re: [Intel-gfx] [RFC 6/6] drm/i915: Implement fdinfo memory stats printing

2023-04-18 Thread Rob Clark

On Tue, Apr 18, 2023 at 7:58 AM Tvrtko Ursulin
 wrote:
>
>
> On 18/04/2023 15:39, Rob Clark wrote:
> > On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> Show how more driver specific set of memory stats could be shown,
> >> more specifically where object can reside in multiple regions, showing all
> >> the supported stats, and where there is more to show than just user visible
> >> objects.
> >>
> >> WIP...
> >>
> >> Signed-off-by: Tvrtko Ursulin 
> >> ---
> >>   drivers/gpu/drm/i915/i915_driver.c |   5 ++
> >>   drivers/gpu/drm/i915/i915_drm_client.c | 102 +
> >>   drivers/gpu/drm/i915/i915_drm_client.h |   8 ++
> >>   drivers/gpu/drm/i915/i915_drv.h|   2 +
> >>   4 files changed, 117 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_driver.c 
> >> b/drivers/gpu/drm/i915/i915_driver.c
> >> index 6493548c69bf..4c70206cbc27 100644
> >> --- a/drivers/gpu/drm/i915/i915_driver.c
> >> +++ b/drivers/gpu/drm/i915/i915_driver.c
> >> @@ -1806,6 +1806,11 @@ static const struct drm_driver i915_drm_driver = {
> >>  .dumb_create = i915_gem_dumb_create,
> >>  .dumb_map_offset = i915_gem_dumb_mmap_offset,
> >>
> >> +#ifdef CONFIG_PROC_FS
> >> +   .query_fdinfo_memory_regions = i915_query_fdinfo_memory_regions,
> >> +   .query_fdinfo_memory_stats = i915_query_fdinfo_memory_stats,
> >> +#endif
> >> +
> >>  .ioctls = i915_ioctls,
> >>  .num_ioctls = ARRAY_SIZE(i915_ioctls),
> >>  .fops = _driver_fops,
> >> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
> >> b/drivers/gpu/drm/i915/i915_drm_client.c
> >> index c654984189f7..65857c68bdb3 100644
> >> --- a/drivers/gpu/drm/i915/i915_drm_client.c
> >> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> >> @@ -12,6 +12,7 @@
> >>   #include 
> >>
> >>   #include "gem/i915_gem_context.h"
> >> +#include "intel_memory_region.h"
> >>   #include "i915_drm_client.h"
> >>   #include "i915_file_private.h"
> >>   #include "i915_gem.h"
> >> @@ -112,4 +113,105 @@ void i915_drm_client_fdinfo(struct drm_printer *p, 
> >> struct drm_file *file)
> >>  for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
> >>  show_client_class(p, i915, file_priv->client, i);
> >>   }
> >> +
> >> +char **
> >> +i915_query_fdinfo_memory_regions(struct drm_device *dev, unsigned int 
> >> *num)
> >> +{
> >> +   struct drm_i915_private *i915 = to_i915(dev);
> >> +   struct intel_memory_region *mr;
> >> +   enum intel_region_id id;
> >> +
> >> +   /* FIXME move to init */
> >> +   for_each_memory_region(mr, i915, id) {
> >> +   if (!i915->mm.region_names[id])
> >> +   i915->mm.region_names[id] = mr->name;
> >> +   }
> >> +
> >> +   *num = id;
> >> +
> >> +   return i915->mm.region_names;
> >> +}
> >> +
> >> +static void
> >> +add_obj(struct drm_i915_gem_object *obj, struct drm_fdinfo_memory_stat 
> >> *stats)
> >> +{
> >> +struct intel_memory_region *mr;
> >> +   u64 sz = obj->base.size;
> >> +enum intel_region_id id;
> >> +   unsigned int i;
> >> +
> >> +   if (!obj)
> >> +   return;
> >> +
> >> +   /* Attribute size and shared to all possible memory regions. */
> >> +   for (i = 0; i < obj->mm.n_placements; i++) {
> >> +   mr = obj->mm.placements[i];
> >> +   id = mr->id;
> >> +
> >> +   stats[id].size += sz;
> >
> > This implies that summing up all of the categories is not the same as
> > the toplevel stats that I was proposing

Sorry, I mis-spoke, I meant "summing up all of the regions is not..."

> Correct, my categories are a bit different. You had private and shared as two 
> mutually exclusive buckets, and then resident as subset of either/both. I 
> have size as analogue to VmSize and resident as a subset of that, analogue to 
> VmRss.
>

I split shared because by definition shared buffers can be counted
against multiple drm_f

Re: [Intel-gfx] [RFC 6/6] drm/i915: Implement fdinfo memory stats printing

2023-04-18 Thread Rob Clark

On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> Show how more driver specific set of memory stats could be shown,
> more specifically where object can reside in multiple regions, showing all
> the supported stats, and where there is more to show than just user visible
> objects.
>
> WIP...
>
> Signed-off-by: Tvrtko Ursulin 
> ---
>  drivers/gpu/drm/i915/i915_driver.c |   5 ++
>  drivers/gpu/drm/i915/i915_drm_client.c | 102 +
>  drivers/gpu/drm/i915/i915_drm_client.h |   8 ++
>  drivers/gpu/drm/i915/i915_drv.h|   2 +
>  4 files changed, 117 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_driver.c 
> b/drivers/gpu/drm/i915/i915_driver.c
> index 6493548c69bf..4c70206cbc27 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -1806,6 +1806,11 @@ static const struct drm_driver i915_drm_driver = {
> .dumb_create = i915_gem_dumb_create,
> .dumb_map_offset = i915_gem_dumb_mmap_offset,
>
> +#ifdef CONFIG_PROC_FS
> +   .query_fdinfo_memory_regions = i915_query_fdinfo_memory_regions,
> +   .query_fdinfo_memory_stats = i915_query_fdinfo_memory_stats,
> +#endif
> +
> .ioctls = i915_ioctls,
> .num_ioctls = ARRAY_SIZE(i915_ioctls),
> .fops = _driver_fops,
> diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
> b/drivers/gpu/drm/i915/i915_drm_client.c
> index c654984189f7..65857c68bdb3 100644
> --- a/drivers/gpu/drm/i915/i915_drm_client.c
> +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> @@ -12,6 +12,7 @@
>  #include 
>
>  #include "gem/i915_gem_context.h"
> +#include "intel_memory_region.h"
>  #include "i915_drm_client.h"
>  #include "i915_file_private.h"
>  #include "i915_gem.h"
> @@ -112,4 +113,105 @@ void i915_drm_client_fdinfo(struct drm_printer *p, 
> struct drm_file *file)
> for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
> show_client_class(p, i915, file_priv->client, i);
>  }
> +
> +char **
> +i915_query_fdinfo_memory_regions(struct drm_device *dev, unsigned int *num)
> +{
> +   struct drm_i915_private *i915 = to_i915(dev);
> +   struct intel_memory_region *mr;
> +   enum intel_region_id id;
> +
> +   /* FIXME move to init */
> +   for_each_memory_region(mr, i915, id) {
> +   if (!i915->mm.region_names[id])
> +   i915->mm.region_names[id] = mr->name;
> +   }
> +
> +   *num = id;
> +
> +   return i915->mm.region_names;
> +}
> +
> +static void
> +add_obj(struct drm_i915_gem_object *obj, struct drm_fdinfo_memory_stat 
> *stats)
> +{
> +struct intel_memory_region *mr;
> +   u64 sz = obj->base.size;
> +enum intel_region_id id;
> +   unsigned int i;
> +
> +   if (!obj)
> +   return;
> +
> +   /* Attribute size and shared to all possible memory regions. */
> +   for (i = 0; i < obj->mm.n_placements; i++) {
> +   mr = obj->mm.placements[i];
> +   id = mr->id;
> +
> +   stats[id].size += sz;

This implies that summing up all of the categories is not the same as
the toplevel stats that I was proposing

BR,
-R

> +   if (obj->base.handle_count > 1)
> +   stats[id].shared += sz;
> +   }
> +
> +   /* Attribute other categories to only the current region. */
> +   mr = obj->mm.region;
> +   if (mr)
> +   id = mr->id;
> +   else
> +   id = INTEL_REGION_SMEM;
> +
> +   if (!i915_gem_object_has_pages(obj))
> +   return;
> +
> +   stats[id].resident += sz;
> +
> +   if (!dma_resv_test_signaled(obj->base.resv, dma_resv_usage_rw(true)))
> +   stats[id].active += sz;
> +   else if (i915_gem_object_is_shrinkable(obj) &&
> +   obj->mm.madv == I915_MADV_DONTNEED)
> +   stats[id].purgeable += sz;
> +}
> +
> +void
> +i915_query_fdinfo_memory_stats(struct drm_file *file,
> +  struct drm_fdinfo_memory_stat *stats)
> +{
> +   struct drm_i915_file_private *file_priv = file->driver_priv;
> +   struct i915_drm_client *client = file_priv->client;
> +   struct drm_gem_object *drm_obj;
> +   struct i915_gem_context *ctx;
> +   int id;
> +
> +   /*
> +* FIXME - we can do this better and in fewer passes if we are to 
> start
> +* exporting proper memory stats.
> +*/
> +
> +   /* User created objects */
> +   spin_lock(>table_lock);
> +   idr_for_each_entry(>object_idr, drm_obj, id)
> +   add_obj(to_intel_bo(drm_obj), stats);
> +   spin_unlock(>table_lock);
> +
> +   /* Contexts, rings, timelines, page tables, ... */
> +   rcu_read_lock();
> +   list_for_each_entry_rcu(ctx, >ctx_list, client_link) {
> +   struct i915_gem_engines_iter it;
> +   struct intel_context *ce;
> +
> +   for_each_gem_engine(ce,

Re: [Intel-gfx] [RFC 3/6] drm: Add fdinfo memory stats

2023-04-18 Thread Rob Clark

On Tue, Apr 18, 2023 at 7:19 AM Tvrtko Ursulin
 wrote:
>
>
> On 18/04/2023 14:49, Rob Clark wrote:
> > On Tue, Apr 18, 2023 at 2:00 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 17/04/2023 20:39, Rob Clark wrote:
> >>> On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
> >>>  wrote:
> >>>>
> >>>> From: Tvrtko Ursulin 
> >>>>
> >>>> Add support to dump GEM stats to fdinfo.
> >>>>
> >>>> Signed-off-by: Tvrtko Ursulin 
> >>>> ---
> >>>>Documentation/gpu/drm-usage-stats.rst | 12 +++
> >>>>drivers/gpu/drm/drm_file.c| 52 +++
> >>>>include/drm/drm_drv.h |  7 
> >>>>include/drm/drm_file.h|  8 +
> >>>>4 files changed, 79 insertions(+)
> >>>>
> >>>> diff --git a/Documentation/gpu/drm-usage-stats.rst 
> >>>> b/Documentation/gpu/drm-usage-stats.rst
> >>>> index 2ab32c40e93c..8273a41b2fb0 100644
> >>>> --- a/Documentation/gpu/drm-usage-stats.rst
> >>>> +++ b/Documentation/gpu/drm-usage-stats.rst
> >>>> @@ -21,6 +21,7 @@ File format specification
> >>>>
> >>>>- File shall contain one key value pair per one line of text.
> >>>>- Colon character (`:`) must be used to delimit keys and values.
> >>>> +- Caret (`^`) is also a reserved character.
> >>>
> >>> this doesn't solve the problem that led me to drm-$CATEGORY-memory... ;-)
> >>
> >> Could you explain or remind me with a link to a previous explanation?
> >
> > How is userspace supposed to know that "drm-memory-foo" is a memory
> > type "foo" but drm-memory-foo^size is not memory type "foo^size"?
>
> Are you referring to nvtop?

I'm not referring to any particular app.  It could even be some app
that isn't even written yet but started with an already existing
kernel without this change.  It is just a general point about forwards
compatibility of old userspace with new kernel.  And it doesn't really
matter what special character you use.  You can't retroactively define
some newly special characters.

BR,
-R

> Indeed that one hardcodes:
>
>static const char drm_amdgpu_vram[] = "drm-memory-vram";
>
> And does brute strcmp on it:
>
>} else if (!strcmp(key, drm_amdgpu_vram_old) || !strcmp(key, 
> drm_amdgpu_vram)) {
>
> So okay for that one, if we need to keep it working I just change this in my 
> RFC:
>
> """
> Resident category is identical to the drm-memory- key and two should be
> mutually exclusive.
> """
>
> Into this:
>
> """
> Resident category is identical to the drm-memory- key and where the 
> categorized one is present the legacy one must be too in order to preserve 
> backward compatibility.
> """
>
> Addition to my RFC:
>
> ...
> for (i = 0; i < num; i++) {
> if (!regions[i]) /* Allow sparse name arrays. */
> continue;
>
> print_stat(p, "size", regions[i], stats[i].size);
> print_stat(p, "shared", regions[i], stats[i].shared);
> +   print_stat(p, "", regions[i], stats[i].resident);
> print_stat(p, "resident", regions[i], stats[i].resident);
> print_stat(p, "purgeable", regions[i], stats[i].purgeable);
> print_stat(p, "active", regions[i], stats[i].active);
> }
> ...
>
> Results in output like this (in theory, if/when amdgpu takes on the extended 
> format):
>
> drm-memory-vram-size: ... KiB
> drm-memory-vram: $same KiB
> drm-memory-vram-resident: $same KiB
> drm-memory-vram-...:
>
> Regards,
>
> Tvrtko
>
> P.S. Would a slash instead of a caret be prettier?
>
> >> What tool barfs on it and how? Spirit of drm-usage-stats.pl is that for
> >> both drm-engine-: and drm-memory-: generic userspace is
> >> supposed to take the whole  as opaque and just enumerate all it finds.
> >>
> >> Gputop manages to do that with engines names it knows _nothing_ about.
> >> If it worked with memory regions it would just show the list of new
> >> regions as for example "system^resident", "system^active". Then tools
> >> can be extended to understand it better and provide a sub-breakdown if
> >> wanted.
> >>
> >>

Re: [Intel-gfx] [RFC 3/6] drm: Add fdinfo memory stats

2023-04-18 Thread Rob Clark

On Tue, Apr 18, 2023 at 3:47 AM Tvrtko Ursulin
 wrote:
>
>
> On 17/04/2023 17:20, Christian König wrote:
> > Am 17.04.23 um 17:56 schrieb Tvrtko Ursulin:
> >> From: Tvrtko Ursulin 
> >>
> >> Add support to dump GEM stats to fdinfo.
> >>
> >> Signed-off-by: Tvrtko Ursulin 
> >> ---
> >>   Documentation/gpu/drm-usage-stats.rst | 12 +++
> >>   drivers/gpu/drm/drm_file.c| 52 +++
> >>   include/drm/drm_drv.h |  7 
> >>   include/drm/drm_file.h|  8 +
> >>   4 files changed, 79 insertions(+)
> >>
> >> diff --git a/Documentation/gpu/drm-usage-stats.rst
> >> b/Documentation/gpu/drm-usage-stats.rst
> >> index 2ab32c40e93c..8273a41b2fb0 100644
> >> --- a/Documentation/gpu/drm-usage-stats.rst
> >> +++ b/Documentation/gpu/drm-usage-stats.rst
> >> @@ -21,6 +21,7 @@ File format specification
> >>   - File shall contain one key value pair per one line of text.
> >>   - Colon character (`:`) must be used to delimit keys and values.
> >> +- Caret (`^`) is also a reserved character.
> >>   - All keys shall be prefixed with `drm-`.
> >>   - Whitespace between the delimiter and first non-whitespace
> >> character shall be
> >> ignored when parsing.
> >> @@ -105,6 +106,17 @@ object belong to this client, in the respective
> >> memory region.
> >>   Default unit shall be bytes with optional unit specifiers of 'KiB'
> >> or 'MiB'
> >>   indicating kibi- or mebi-bytes.
> >> +- drm-memory-^size:   [KiB|MiB]
> >> +- drm-memory-^shared: [KiB|MiB]
> >> +- drm-memory-^resident:   [KiB|MiB]
> >> +- drm-memory-^purgeable:  [KiB|MiB]
> >> +- drm-memory-^active: [KiB|MiB]
> >
> > What exactly does size/shared/active mean here?
> >
> > If it means what I think it does I don't see how TTM based drivers
> > should track that in the first place.
>
> Size is an analogue to process VM size - maximum reachable/allocated
> memory belonging to a client.
>
> Shared could be IMO viewed as a bit dodgy and either could be dropped or
> needs to be better defined. For now I simply followed the implementation
> from Rob's RFC which is:
>
> if (obj->handle_count > 1)
> stats[0].shared += sz;
>
> I can see some usefulness to it but haven't thought much about semantics
> yet.
>
> Similar story with active which I think is not very useful.
> Implementation is like this:
>
> if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true)))
> stats[0].active += sz;
>
> For me it is too transient to bring much value over the resident
> category. I supposed only advantage is that resident (as does purgeable)
> needs driver cooperation while active can be done like about from DRM
> core. Although I am not a big fan of counting these stats from the core
> to begin with..

Maybe there is a better way to track it, like setting an age/time when
the buffer is last active (which could be made part of dma_resv to
make it automatic). The question I really want to answer is more like
"over the last T ms how many buffers were active".  This is a useful
metric esp when you think about a use-case like the browser where you
might have a lot of textures/etc for your 80 different tabs but at any
given time only a small subset is active and the rest can be swapped
out to zram if needed.

BR,
-R

>
> Regards,
>
> Tvrtko
>
> >> +Resident category is identical to the drm-memory- key and two
> >> should be
> >> +mutually exclusive.
> >> +
> >> +TODO more description text...
> >> +
> >>   - drm-cycles- 
> >>   Engine identifier string must be the same as the one specified in the
> >> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> >> index 37b4f76a5191..e202f79e816d 100644
> >> --- a/drivers/gpu/drm/drm_file.c
> >> +++ b/drivers/gpu/drm/drm_file.c
> >> @@ -42,6 +42,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>   #include 
> >>   #include "drm_crtc_internal.h"
> >> @@ -871,6 +872,54 @@ void drm_send_event(struct drm_device *dev,
> >> struct drm_pending_event *e)
> >>   }
> >>   EXPORT_SYMBOL(drm_send_event);
> >> +static void
> >> +print_stat(struct drm_printer *p, const char *stat, const char
> >> *region, u64 sz)
> >> +{
> >> +const char *units[] = {"", " KiB", " MiB"};
> >> +unsigned int u;
> >> +
> >> +if (sz == ~0ull) /* Not supported by the driver. */
> >> +return;
> >> +
> >> +for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> >> +if (sz < SZ_1K)
> >> +break;
> >> +sz = div_u64(sz, SZ_1K);
> >> +}
> >> +
> >> +drm_printf(p, "drm-memory-%s^%s:\t%llu%s\n",
> >> +   region, stat, sz, units[u]);
> >> +}
> >> +
> >> +static void print_memory_stats(struct drm_printer *p, struct drm_file
> >> *file)
> >> +{
> >> +struct drm_device *dev = file->minor->dev;
> >> +struct drm_fdinfo_memory_stat *stats;
> >> +unsigned int num, i;
> >> +char **regions;
> >> +
> >> +regions =

Re: [Intel-gfx] [RFC 3/6] drm: Add fdinfo memory stats

2023-04-18 Thread Rob Clark

On Tue, Apr 18, 2023 at 2:00 AM Tvrtko Ursulin
 wrote:
>
>
> On 17/04/2023 20:39, Rob Clark wrote:
> > On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> Add support to dump GEM stats to fdinfo.
> >>
> >> Signed-off-by: Tvrtko Ursulin 
> >> ---
> >>   Documentation/gpu/drm-usage-stats.rst | 12 +++
> >>   drivers/gpu/drm/drm_file.c| 52 +++
> >>   include/drm/drm_drv.h |  7 
> >>   include/drm/drm_file.h|  8 +
> >>   4 files changed, 79 insertions(+)
> >>
> >> diff --git a/Documentation/gpu/drm-usage-stats.rst 
> >> b/Documentation/gpu/drm-usage-stats.rst
> >> index 2ab32c40e93c..8273a41b2fb0 100644
> >> --- a/Documentation/gpu/drm-usage-stats.rst
> >> +++ b/Documentation/gpu/drm-usage-stats.rst
> >> @@ -21,6 +21,7 @@ File format specification
> >>
> >>   - File shall contain one key value pair per one line of text.
> >>   - Colon character (`:`) must be used to delimit keys and values.
> >> +- Caret (`^`) is also a reserved character.
> >
> > this doesn't solve the problem that led me to drm-$CATEGORY-memory... ;-)
>
> Could you explain or remind me with a link to a previous explanation?

How is userspace supposed to know that "drm-memory-foo" is a memory
type "foo" but drm-memory-foo^size is not memory type "foo^size"?

BR,
-R

> What tool barfs on it and how? Spirit of drm-usage-stats.pl is that for
> both drm-engine-: and drm-memory-: generic userspace is
> supposed to take the whole  as opaque and just enumerate all it finds.
>
> Gputop manages to do that with engines names it knows _nothing_ about.
> If it worked with memory regions it would just show the list of new
> regions as for example "system^resident", "system^active". Then tools
> can be extended to understand it better and provide a sub-breakdown if
> wanted.
>
> Ugly not, we can change from caret to something nicer, unless I am
> missing something it works, no?
>
> Regards,
>
> Tvrtko
>
> >
> > (also, it is IMHO rather ugly)
> >
> > BR,
> > -R
> >
> >>   - All keys shall be prefixed with `drm-`.
> >>   - Whitespace between the delimiter and first non-whitespace character 
> >> shall be
> >> ignored when parsing.
> >> @@ -105,6 +106,17 @@ object belong to this client, in the respective 
> >> memory region.
> >>   Default unit shall be bytes with optional unit specifiers of 'KiB' or 
> >> 'MiB'
> >>   indicating kibi- or mebi-bytes.
> >>
> >> +- drm-memory-^size:   [KiB|MiB]
> >> +- drm-memory-^shared: [KiB|MiB]
> >> +- drm-memory-^resident:   [KiB|MiB]
> >> +- drm-memory-^purgeable:  [KiB|MiB]
> >> +- drm-memory-^active: [KiB|MiB]
> >> +
> >> +Resident category is identical to the drm-memory- key and two should 
> >> be
> >> +mutually exclusive.
> >> +
> >> +TODO more description text...
> >> +
> >>   - drm-cycles- 
> >>
> >>   Engine identifier string must be the same as the one specified in the
> >> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> >> index 37b4f76a5191..e202f79e816d 100644
> >> --- a/drivers/gpu/drm/drm_file.c
> >> +++ b/drivers/gpu/drm/drm_file.c
> >> @@ -42,6 +42,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>   #include 
> >>
> >>   #include "drm_crtc_internal.h"
> >> @@ -871,6 +872,54 @@ void drm_send_event(struct drm_device *dev, struct 
> >> drm_pending_event *e)
> >>   }
> >>   EXPORT_SYMBOL(drm_send_event);
> >>
> >> +static void
> >> +print_stat(struct drm_printer *p, const char *stat, const char *region, 
> >> u64 sz)
> >> +{
> >> +   const char *units[] = {"", " KiB", " MiB"};
> >> +   unsigned int u;
> >> +
> >> +   if (sz == ~0ull) /* Not supported by the driver. */
> >> +   return;
> >> +
> >> +   for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> >> +   if (sz < SZ_1K)
> >> +   break;
> >> +   sz = div_u64(sz, SZ_1K);
> >> +   }
> >> +
> >> +   drm_printf(p, "drm-memory-%s^%s:\t%llu%s\n&qu

Re: [Intel-gfx] [RFC 3/6] drm: Add fdinfo memory stats

2023-04-17 Thread Rob Clark

On Mon, Apr 17, 2023 at 8:56 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> Add support to dump GEM stats to fdinfo.
>
> Signed-off-by: Tvrtko Ursulin 
> ---
>  Documentation/gpu/drm-usage-stats.rst | 12 +++
>  drivers/gpu/drm/drm_file.c| 52 +++
>  include/drm/drm_drv.h |  7 
>  include/drm/drm_file.h|  8 +
>  4 files changed, 79 insertions(+)
>
> diff --git a/Documentation/gpu/drm-usage-stats.rst 
> b/Documentation/gpu/drm-usage-stats.rst
> index 2ab32c40e93c..8273a41b2fb0 100644
> --- a/Documentation/gpu/drm-usage-stats.rst
> +++ b/Documentation/gpu/drm-usage-stats.rst
> @@ -21,6 +21,7 @@ File format specification
>
>  - File shall contain one key value pair per one line of text.
>  - Colon character (`:`) must be used to delimit keys and values.
> +- Caret (`^`) is also a reserved character.

this doesn't solve the problem that led me to drm-$CATEGORY-memory... ;-)

(also, it is IMHO rather ugly)

BR,
-R

>  - All keys shall be prefixed with `drm-`.
>  - Whitespace between the delimiter and first non-whitespace character shall 
> be
>ignored when parsing.
> @@ -105,6 +106,17 @@ object belong to this client, in the respective memory 
> region.
>  Default unit shall be bytes with optional unit specifiers of 'KiB' or 'MiB'
>  indicating kibi- or mebi-bytes.
>
> +- drm-memory-^size:   [KiB|MiB]
> +- drm-memory-^shared: [KiB|MiB]
> +- drm-memory-^resident:   [KiB|MiB]
> +- drm-memory-^purgeable:  [KiB|MiB]
> +- drm-memory-^active: [KiB|MiB]
> +
> +Resident category is identical to the drm-memory- key and two should be
> +mutually exclusive.
> +
> +TODO more description text...
> +
>  - drm-cycles- 
>
>  Engine identifier string must be the same as the one specified in the
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 37b4f76a5191..e202f79e816d 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -42,6 +42,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #include "drm_crtc_internal.h"
> @@ -871,6 +872,54 @@ void drm_send_event(struct drm_device *dev, struct 
> drm_pending_event *e)
>  }
>  EXPORT_SYMBOL(drm_send_event);
>
> +static void
> +print_stat(struct drm_printer *p, const char *stat, const char *region, u64 
> sz)
> +{
> +   const char *units[] = {"", " KiB", " MiB"};
> +   unsigned int u;
> +
> +   if (sz == ~0ull) /* Not supported by the driver. */
> +   return;
> +
> +   for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> +   if (sz < SZ_1K)
> +   break;
> +   sz = div_u64(sz, SZ_1K);
> +   }
> +
> +   drm_printf(p, "drm-memory-%s^%s:\t%llu%s\n",
> +  region, stat, sz, units[u]);
> +}
> +
> +static void print_memory_stats(struct drm_printer *p, struct drm_file *file)
> +{
> +   struct drm_device *dev = file->minor->dev;
> +   struct drm_fdinfo_memory_stat *stats;
> +   unsigned int num, i;
> +   char **regions;
> +
> +   regions = dev->driver->query_fdinfo_memory_regions(dev, );
> +
> +   stats = kcalloc(num, sizeof(*stats), GFP_KERNEL);
> +   if (!stats)
> +   return;
> +
> +   dev->driver->query_fdinfo_memory_stats(file, stats);
> +
> +   for (i = 0; i < num; i++) {
> +   if (!regions[i]) /* Allow sparse name arrays. */
> +   continue;
> +
> +   print_stat(p, "size", regions[i], stats[i].size);
> +   print_stat(p, "shared", regions[i], stats[i].shared);
> +   print_stat(p, "resident", regions[i], stats[i].resident);
> +   print_stat(p, "purgeable", regions[i], stats[i].purgeable);
> +   print_stat(p, "active", regions[i], stats[i].active);
> +   }
> +
> +   kfree(stats);
> +}
> +
>  /**
>   * drm_show_fdinfo - helper for drm file fops
>   * @seq_file: output stream
> @@ -900,6 +949,9 @@ void drm_show_fdinfo(struct seq_file *m, struct file *f)
>
> if (dev->driver->show_fdinfo)
> dev->driver->show_fdinfo(, file);
> +
> +   if (dev->driver->query_fdinfo_memory_regions)
> +   print_memory_stats(, file);
>  }
>  EXPORT_SYMBOL(drm_show_fdinfo);
>
> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
> index 89e2706cac56..ccc1cd98d2aa 100644
> --- a/include/drm/drm_drv.h
> +++ b/include/drm/drm_drv.h
> @@ -35,6 +35,7 @@
>  #include 
>
>  struct drm_file;
> +struct drm_fdinfo_memory_stat;
>  struct drm_gem_object;
>  struct drm_master;
>  struct drm_minor;
> @@ -408,6 +409,12 @@ struct drm_driver {
>  */
> void (*show_fdinfo)(struct drm_printer *p, struct drm_file *f);
>
> +   char ** (*query_fdinfo_memory_regions)(struct drm_device *dev,
> +  unsigned int *num);
> +
> +   void (*query_fdinfo_memory_stats)(struct drm_file *f,
> +

Re: [Intel-gfx] [PATCH v4 4/6] drm/i915: Switch to fdinfo helper

2023-04-13 Thread Rob Clark

On Thu, Apr 13, 2023 at 6:07 AM Tvrtko Ursulin
 wrote:
>
>
> On 12/04/2023 23:42, Rob Clark wrote:
> > From: Rob Clark 
>
> There is more do to here to remove my client->id fully (would now be
> dead code) so maybe easiest if you drop this patch and I do it after you
> land this and it propagates to our branches? I'd like to avoid pain with
> conflicts if possible..

That is fine by me

BR,
-R

> Regards,
>
> Tvrtko
>
> >
> > Signed-off-by: Rob Clark 
> > ---
> >   drivers/gpu/drm/i915/i915_driver.c |  3 ++-
> >   drivers/gpu/drm/i915/i915_drm_client.c | 18 +-
> >   drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
> >   3 files changed, 8 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_driver.c 
> > b/drivers/gpu/drm/i915/i915_driver.c
> > index db7a86def7e2..0d91f85f8b97 100644
> > --- a/drivers/gpu/drm/i915/i915_driver.c
> > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > @@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops 
> > = {
> >   .compat_ioctl = i915_ioc32_compat_ioctl,
> >   .llseek = noop_llseek,
> >   #ifdef CONFIG_PROC_FS
> > - .show_fdinfo = i915_drm_client_fdinfo,
> > + .show_fdinfo = drm_show_fdinfo,
> >   #endif
> >   };
> >
> > @@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
> >   .open = i915_driver_open,
> >   .lastclose = i915_driver_lastclose,
> >   .postclose = i915_driver_postclose,
> > + .show_fdinfo = i915_drm_client_fdinfo,
> >
> >   .prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> >   .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
> > b/drivers/gpu/drm/i915/i915_drm_client.c
> > index b09d1d386574..4a77e5e47f79 100644
> > --- a/drivers/gpu/drm/i915/i915_drm_client.c
> > +++ b/drivers/gpu/drm/i915/i915_drm_client.c
> > @@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, 
> > unsigned int class)
> >   }
> >
> >   static void
> > -show_client_class(struct seq_file *m,
> > +show_client_class(struct drm_printer *p,
> > struct i915_drm_client *client,
> > unsigned int class)
> >   {
> > @@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
> >   rcu_read_unlock();
> >
> >   if (capacity)
> > - seq_printf(m, "drm-engine-%s:\t%llu ns\n",
> > + drm_printf(p, "drm-engine-%s:\t%llu ns\n",
> >  uabi_class_names[class], total);
> >
> >   if (capacity > 1)
> > - seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
> > + drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
> >  uabi_class_names[class],
> >  capacity);
> >   }
> >
> > -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
> > +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
> >   {
> > - struct drm_file *file = f->private_data;
> >   struct drm_i915_file_private *file_priv = file->driver_priv;
> >   struct drm_i915_private *i915 = file_priv->dev_priv;
> >   struct i915_drm_client *client = file_priv->client;
> > - struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
> >   unsigned int i;
> >
> >   /*
> > @@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct 
> > file *f)
> >* **
> >*/
> >
> > - seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
> > - seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
> > -pci_domain_nr(pdev->bus), pdev->bus->number,
> > -PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
> > - seq_printf(m, "drm-client-id:\t%u\n", client->id);
> > -
> >   /*
> >* Temporarily skip showing client engine information with GuC 
> > submission till
> >* fetching engine busyness is implemented in the GuC submission 
> > backend
> > @@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct 
> > file *f)
> >   return;
> >
> >   for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
> > - show_client_class(m, client, i);
> > + show_client_class(p, client, i);
> >   }
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/i915_drm_client.h 
> > b/drivers/gpu/drm/i915/i915_drm_client.h
> > index 69496af996d9..ef85fef45de5 100644
> > --- a/drivers/gpu/drm/i915/i915_drm_client.h
> > +++ b/drivers/gpu/drm/i915/i915_drm_client.h
> > @@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct 
> > i915_drm_client *client)
> >   struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients 
> > *clients);
> >
> >   #ifdef CONFIG_PROC_FS
> > -void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
> > +void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
> >   #endif
> >
> >   void i915_drm_clients_fini(struct i915_drm_clients *clients);

[Intel-gfx] [PATCH v4 4/6] drm/i915: Switch to fdinfo helper

2023-04-12 Thread Rob Clark

From: Rob Clark 

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/i915_driver.c |  3 ++-
 drivers/gpu/drm/i915/i915_drm_client.c | 18 +-
 drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
 3 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_driver.c 
b/drivers/gpu/drm/i915/i915_driver.c
index db7a86def7e2..0d91f85f8b97 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
.compat_ioctl = i915_ioc32_compat_ioctl,
.llseek = noop_llseek,
 #ifdef CONFIG_PROC_FS
-   .show_fdinfo = i915_drm_client_fdinfo,
+   .show_fdinfo = drm_show_fdinfo,
 #endif
 };
 
@@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
.open = i915_driver_open,
.lastclose = i915_driver_lastclose,
.postclose = i915_driver_postclose,
+   .show_fdinfo = i915_drm_client_fdinfo,
 
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c
index b09d1d386574..4a77e5e47f79 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned 
int class)
 }
 
 static void
-show_client_class(struct seq_file *m,
+show_client_class(struct drm_printer *p,
  struct i915_drm_client *client,
  unsigned int class)
 {
@@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
rcu_read_unlock();
 
if (capacity)
-   seq_printf(m, "drm-engine-%s:\t%llu ns\n",
+   drm_printf(p, "drm-engine-%s:\t%llu ns\n",
   uabi_class_names[class], total);
 
if (capacity > 1)
-   seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
+   drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
   uabi_class_names[class],
   capacity);
 }
 
-void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
+void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-   struct drm_file *file = f->private_data;
struct drm_i915_file_private *file_priv = file->driver_priv;
struct drm_i915_private *i915 = file_priv->dev_priv;
struct i915_drm_client *client = file_priv->client;
-   struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
unsigned int i;
 
/*
@@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct 
file *f)
 * **
 */
 
-   seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
-   seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
-  pci_domain_nr(pdev->bus), pdev->bus->number,
-  PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
-   seq_printf(m, "drm-client-id:\t%u\n", client->id);
-
/*
 * Temporarily skip showing client engine information with GuC 
submission till
 * fetching engine busyness is implemented in the GuC submission backend
@@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file 
*f)
return;
 
for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
-   show_client_class(m, client, i);
+   show_client_class(p, client, i);
 }
 #endif
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h 
b/drivers/gpu/drm/i915/i915_drm_client.h
index 69496af996d9..ef85fef45de5 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct i915_drm_client 
*client)
 struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);
 
 #ifdef CONFIG_PROC_FS
-void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
+void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
 #endif
 
 void i915_drm_clients_fini(struct i915_drm_clients *clients);
-- 
2.39.2

[Intel-gfx] [PATCH v4 0/6] drm: fdinfo memory stats

2023-04-12 Thread Rob Clark

From: Rob Clark 

Similar motivation to other similar recent attempt[1].  But with an
attempt to have some shared code for this.  As well as documentation.

It is probably a bit UMA-centric, I guess devices with VRAM might want
some placement stats as well.  But this seems like a reasonable start.

Basic gputop support: https://patchwork.freedesktop.org/series/116236/
And already nvtop support: https://github.com/Syllo/nvtop/pull/204

[1] https://patchwork.freedesktop.org/series/112397/

Rob Clark (6):
  drm: Add common fdinfo helper
  drm/msm: Switch to fdinfo helper
  drm/amdgpu: Switch to fdinfo helper
  drm/i915: Switch to fdinfo helper
  drm: Add fdinfo memory stats
  drm/msm: Add memory stats to fdinfo

 Documentation/gpu/drm-usage-stats.rst  |  31 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  16 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |   2 +-
 drivers/gpu/drm/drm_file.c | 111 +
 drivers/gpu/drm/i915/i915_driver.c |   3 +-
 drivers/gpu/drm/i915/i915_drm_client.c |  18 +---
 drivers/gpu/drm/i915/i915_drm_client.h |   2 +-
 drivers/gpu/drm/msm/msm_drv.c  |  11 +-
 drivers/gpu/drm/msm/msm_gem.c  |  15 +++
 drivers/gpu/drm/msm/msm_gpu.c  |   2 -
 include/drm/drm_drv.h  |   7 ++
 include/drm/drm_file.h |   5 +
 include/drm/drm_gem.h  |  30 ++
 14 files changed, 220 insertions(+), 36 deletions(-)

-- 
2.39.2

[Intel-gfx] [PATCH v3 4/7] drm/i915: Switch to fdinfo helper

2023-04-11 Thread Rob Clark

From: Rob Clark 

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/i915_driver.c |  3 ++-
 drivers/gpu/drm/i915/i915_drm_client.c | 18 +-
 drivers/gpu/drm/i915/i915_drm_client.h |  2 +-
 3 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_driver.c 
b/drivers/gpu/drm/i915/i915_driver.c
index db7a86def7e2..37eacaa3064b 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1696,7 +1696,7 @@ static const struct file_operations i915_driver_fops = {
.compat_ioctl = i915_ioc32_compat_ioctl,
.llseek = noop_llseek,
 #ifdef CONFIG_PROC_FS
-   .show_fdinfo = i915_drm_client_fdinfo,
+   .show_fdinfo = drm_fop_show_fdinfo,
 #endif
 };
 
@@ -1796,6 +1796,7 @@ static const struct drm_driver i915_drm_driver = {
.open = i915_driver_open,
.lastclose = i915_driver_lastclose,
.postclose = i915_driver_postclose,
+   .show_fdinfo = i915_drm_client_fdinfo,
 
.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c 
b/drivers/gpu/drm/i915/i915_drm_client.c
index b09d1d386574..4a77e5e47f79 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -101,7 +101,7 @@ static u64 busy_add(struct i915_gem_context *ctx, unsigned 
int class)
 }
 
 static void
-show_client_class(struct seq_file *m,
+show_client_class(struct drm_printer *p,
  struct i915_drm_client *client,
  unsigned int class)
 {
@@ -117,22 +117,20 @@ show_client_class(struct seq_file *m,
rcu_read_unlock();
 
if (capacity)
-   seq_printf(m, "drm-engine-%s:\t%llu ns\n",
+   drm_printf(p, "drm-engine-%s:\t%llu ns\n",
   uabi_class_names[class], total);
 
if (capacity > 1)
-   seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
+   drm_printf(p, "drm-engine-capacity-%s:\t%u\n",
   uabi_class_names[class],
   capacity);
 }
 
-void i915_drm_client_fdinfo(struct seq_file *m, struct file *f)
+void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file)
 {
-   struct drm_file *file = f->private_data;
struct drm_i915_file_private *file_priv = file->driver_priv;
struct drm_i915_private *i915 = file_priv->dev_priv;
struct i915_drm_client *client = file_priv->client;
-   struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
unsigned int i;
 
/*
@@ -141,12 +139,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct 
file *f)
 * **
 */
 
-   seq_printf(m, "drm-driver:\t%s\n", i915->drm.driver->name);
-   seq_printf(m, "drm-pdev:\t%04x:%02x:%02x.%d\n",
-  pci_domain_nr(pdev->bus), pdev->bus->number,
-  PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn));
-   seq_printf(m, "drm-client-id:\t%u\n", client->id);
-
/*
 * Temporarily skip showing client engine information with GuC 
submission till
 * fetching engine busyness is implemented in the GuC submission backend
@@ -155,6 +147,6 @@ void i915_drm_client_fdinfo(struct seq_file *m, struct file 
*f)
return;
 
for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++)
-   show_client_class(m, client, i);
+   show_client_class(p, client, i);
 }
 #endif
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h 
b/drivers/gpu/drm/i915/i915_drm_client.h
index 69496af996d9..ef85fef45de5 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -60,7 +60,7 @@ static inline void i915_drm_client_put(struct i915_drm_client 
*client)
 struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients);
 
 #ifdef CONFIG_PROC_FS
-void i915_drm_client_fdinfo(struct seq_file *m, struct file *f);
+void i915_drm_client_fdinfo(struct drm_printer *p, struct drm_file *file);
 #endif
 
 void i915_drm_clients_fini(struct i915_drm_clients *clients);
-- 
2.39.2

[Intel-gfx] [PATCH v3 0/7] drm: fdinfo memory stats

2023-04-11 Thread Rob Clark

From: Rob Clark 

Similar motivation to other similar recent attempt[1].  But with an
attempt to have some shared code for this.  As well as documentation.

It is probably a bit UMA-centric, I guess devices with VRAM might want
some placement stats as well.  But this seems like a reasonable start.

Basic gputop support: https://patchwork.freedesktop.org/series/116236/
And already nvtop support: https://github.com/Syllo/nvtop/pull/204

[1] https://patchwork.freedesktop.org/series/112397/

Rob Clark (7):
  drm: Add common fdinfo helper
  drm/msm: Switch to fdinfo helper
  drm/amdgpu: Switch to fdinfo helper
  drm/i915: Switch to fdinfo helper
  drm/etnaviv: Switch to fdinfo helper
  drm: Add fdinfo memory stats
  drm/msm: Add memory stats to fdinfo

 Documentation/gpu/drm-usage-stats.rst  |  21 
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  16 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.h |   2 +-
 drivers/gpu/drm/drm_file.c | 115 +
 drivers/gpu/drm/etnaviv/etnaviv_drv.c  |  10 +-
 drivers/gpu/drm/i915/i915_driver.c |   3 +-
 drivers/gpu/drm/i915/i915_drm_client.c |  18 +---
 drivers/gpu/drm/i915/i915_drm_client.h |   2 +-
 drivers/gpu/drm/msm/msm_drv.c  |  11 +-
 drivers/gpu/drm/msm/msm_gem.c  |  15 +++
 drivers/gpu/drm/msm/msm_gpu.c  |   2 -
 include/drm/drm_drv.h  |   7 ++
 include/drm/drm_file.h |   5 +
 include/drm/drm_gem.h  |  19 
 15 files changed, 208 insertions(+), 41 deletions(-)

-- 
2.39.2

Re: [Intel-gfx] [igt-dev] [PATCH i-g-t 8/8] gputop: Basic vendor agnostic GPU top tool

2023-04-06 Thread Rob Clark

On Thu, Apr 6, 2023 at 4:08 AM Tvrtko Ursulin
 wrote:
>
>
> On 05/04/2023 18:57, Rob Clark wrote:
> > On Tue, Jan 31, 2023 at 3:33 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> Rudimentary vendor agnostic example of how lib_igt_drm_clients can be used
> >> to display a sorted by card and usage list of processes using GPUs.
> >>
> >> Borrows a bit of code from intel_gpu_top but for now omits the fancy
> >> features like interactive functionality, card selection, client
> >> aggregation, sort modes, JSON output  and pretty engine names. Also no
> >> support for global GPU or system metrics.
> >>
> >> On the other hand it shows clients from all DRM cards which
> >> intel_gpu_top does not do.
> >>
> >> Signed-off-by: Tvrtko Ursulin 
> >> Cc: Rob Clark 
> >> Cc: Christian König 
> >> Acked-by: Christian König 
> >
> > Reviewed-by: Rob Clark 
>
> Presumably for 8/8 only?
>
> The rest of the series does not apply any more by now. I need to rebase..

I didn't look closely at the rest of the series (was kinda assuming
that was mostly just moving things around).. but I see you rebased it
so I can take a look.

BR,
-R

> Regards,
>
> Tvrtko
>
> >
> >> ---
> >>   tools/gputop.c| 260 ++
> >>   tools/meson.build |   5 +
> >>   2 files changed, 265 insertions(+)
> >>   create mode 100644 tools/gputop.c
> >>
> >> diff --git a/tools/gputop.c b/tools/gputop.c
> >> new file mode 100644
> >> index ..d259cac1ab17
> >> --- /dev/null
> >> +++ b/tools/gputop.c
> >> @@ -0,0 +1,260 @@
> >> +// SPDX-License-Identifier: MIT
> >> +/*
> >> + * Copyright © 2022 Intel Corporation
> >> + */
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +#include "igt_drm_clients.h"
> >> +#include "igt_drm_fdinfo.h"
> >> +
> >> +static const char *bars[] = { " ", "▏", "▎", "▍", "▌", "▋", "▊", "▉", "█" 
> >> };
> >> +
> >> +static void n_spaces(const unsigned int n)
> >> +{
> >> +   unsigned int i;
> >> +
> >> +   for (i = 0; i < n; i++)
> >> +   putchar(' ');
> >> +}
> >> +
> >> +static void print_percentage_bar(double percent, int max_len)
> >> +{
> >> +   int bar_len, i, len = max_len - 2;
> >> +   const int w = 8;
> >> +
> >> +   assert(max_len > 0);
> >> +
> >> +   bar_len = ceil(w * percent * len / 100.0);
> >> +   if (bar_len > w * len)
> >> +   bar_len = w * len;
> >> +
> >> +   putchar('|');
> >> +
> >> +   for (i = bar_len; i >= w; i -= w)
> >> +   printf("%s", bars[w]);
> >> +   if (i)
> >> +   printf("%s", bars[i]);
> >> +
> >> +   len -= (bar_len + (w - 1)) / w;
> >> +   n_spaces(len);
> >> +
> >> +   putchar('|');
> >> +}
> >> +
> >> +static int
> >> +print_client_header(struct igt_drm_client *c, int lines, int con_w, int 
> >> con_h,
> >> +   int *engine_w)
> >> +{
> >> +   const char *pidname = "PID   NAME ";
> >> +   int ret, len = strlen(pidname);
> >> +
> >> +   if (lines++ >= con_h || len >= con_w)
> >> +   return lines;
> >> +   printf("\033[7m");
> >> +   ret = printf("DRM minor %u", c->drm_minor);
> >> +   n_spaces(con_w - ret);
> >> +
> >> +   if (lines++ >= con_h)
> >> +   return lines;
> >> +   printf("\n%s", pidname);
> >> +
> >> +   if (c->engines->num_engines) {

Re: [Intel-gfx] [PATCH i-g-t 8/8] gputop: Basic vendor agnostic GPU top tool

2023-04-05 Thread Rob Clark

On Tue, Jan 31, 2023 at 3:33 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> Rudimentary vendor agnostic example of how lib_igt_drm_clients can be used
> to display a sorted by card and usage list of processes using GPUs.
>
> Borrows a bit of code from intel_gpu_top but for now omits the fancy
> features like interactive functionality, card selection, client
> aggregation, sort modes, JSON output  and pretty engine names. Also no
> support for global GPU or system metrics.
>
> On the other hand it shows clients from all DRM cards which
> intel_gpu_top does not do.
>
> Signed-off-by: Tvrtko Ursulin 
> Cc: Rob Clark 
> Cc: Christian König 
> Acked-by: Christian König 

Reviewed-by: Rob Clark 

> ---
>  tools/gputop.c| 260 ++
>  tools/meson.build |   5 +
>  2 files changed, 265 insertions(+)
>  create mode 100644 tools/gputop.c
>
> diff --git a/tools/gputop.c b/tools/gputop.c
> new file mode 100644
> index ..d259cac1ab17
> --- /dev/null
> +++ b/tools/gputop.c
> @@ -0,0 +1,260 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "igt_drm_clients.h"
> +#include "igt_drm_fdinfo.h"
> +
> +static const char *bars[] = { " ", "▏", "▎", "▍", "▌", "▋", "▊", "▉", "█" };
> +
> +static void n_spaces(const unsigned int n)
> +{
> +   unsigned int i;
> +
> +   for (i = 0; i < n; i++)
> +   putchar(' ');
> +}
> +
> +static void print_percentage_bar(double percent, int max_len)
> +{
> +   int bar_len, i, len = max_len - 2;
> +   const int w = 8;
> +
> +   assert(max_len > 0);
> +
> +   bar_len = ceil(w * percent * len / 100.0);
> +   if (bar_len > w * len)
> +   bar_len = w * len;
> +
> +   putchar('|');
> +
> +   for (i = bar_len; i >= w; i -= w)
> +   printf("%s", bars[w]);
> +   if (i)
> +   printf("%s", bars[i]);
> +
> +   len -= (bar_len + (w - 1)) / w;
> +   n_spaces(len);
> +
> +   putchar('|');
> +}
> +
> +static int
> +print_client_header(struct igt_drm_client *c, int lines, int con_w, int 
> con_h,
> +   int *engine_w)
> +{
> +   const char *pidname = "PID   NAME ";
> +   int ret, len = strlen(pidname);
> +
> +   if (lines++ >= con_h || len >= con_w)
> +   return lines;
> +   printf("\033[7m");
> +   ret = printf("DRM minor %u", c->drm_minor);
> +   n_spaces(con_w - ret);
> +
> +   if (lines++ >= con_h)
> +   return lines;
> +   printf("\n%s", pidname);
> +
> +   if (c->engines->num_engines) {
> +   unsigned int i;
> +   int width;
> +
> +   *engine_w = width = (con_w - len) / c->engines->num_engines;
> +
> +   for (i = 0; i <= c->engines->max_engine_id; i++) {
> +   const char *name = c->engines->names[i];
> +   int name_len = strlen(name);
> +   int pad = (width - name_len) / 2;
> +   int spaces = width - pad - name_len;
> +
> +   if (!name)
> +   continue;
> +
> +   if (pad < 0 || spaces < 0)
> +   continue;
> +
> +   n_spaces(pad);
> +   printf("%s", name);
> +   n_spaces(spaces);
> +   len += pad + name_len + spaces;
> +   }
> +   }
> +
> +   n_spaces(con_w - len);
> +   printf("\033[0m\n");
> +
> +   return lines;
> +}
> +
> +
> +static bool
> +newheader(const struct igt_drm_client *c, const struct igt_drm_client *pc)
> +{
> +   return !pc || c->drm_minor != pc->drm_minor;
> +}
> +
> +static int
> +print_client(struct igt_drm_client *c, struct igt_drm_client **prevc,
> +double t, int lines, int con_w, int con_h,
> +unsigned int period_us,

Re: [Intel-gfx] [PATCH] drm/atomic-helper: Don't set deadline for modesets

2023-04-05 Thread Rob Clark

On Wed, Apr 5, 2023 at 6:31 AM Daniel Vetter  wrote:
>
> If the crtc is being switched on or off then the semantics of
> computing the timestampe of the next vblank is somewhat ill-defined.
> And indeed, the code splats with a warning in the timestamp
> computation code. Specifically it hits the check to make sure that
> atomic drivers have full set up the timing constants in the drm_vblank
> structure, and that's just not the case before the crtc is actually
> on.
>
> For robustness it seems best to just not set deadlines for modesets.
>
> v2: Also skip on inactive crtc (Ville)
>
> Link: 
> https://lore.kernel.org/dri-devel/dfc21f18-7e1e-48f0-c05a-d659b9c90...@linaro.org/
> Fixes: d39e48ca80c0 ("drm/atomic-helper: Set fence deadline for vblank")
> Cc: Ville Syrjälä 
> Cc: Rob Clark 
> Cc: Daniel Vetter 
> Cc: Maarten Lankhorst 
> Cc: Maxime Ripard 
> Cc: Thomas Zimmermann 
> Reported-by: Dmitry Baryshkov 
> Tested-by: Dmitry Baryshkov  # test patch only
> Cc: Dmitry Baryshkov 
> Signed-off-by: Daniel Vetter 

Reviewed-by: Rob Clark 

> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> b/drivers/gpu/drm/drm_atomic_helper.c
> index f21b5a74176c..d44fb9b87ef8 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1528,6 +1528,12 @@ static void set_fence_deadline(struct drm_device *dev,
> for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
> ktime_t v;
>
> +   if (drm_atomic_crtc_needs_modeset(new_crtc_state))
> +   continue;
> +
> +   if (!new_crtc_state->active)
> +   continue;
> +
> if (drm_crtc_next_vblank_start(crtc, ))
> continue;
>
> --
> 2.40.0
>

Re: [Intel-gfx] [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

2023-04-01 Thread Rob Clark

On Fri, Mar 31, 2023 at 4:30 PM Nathan Chancellor  wrote:
>
> On Fri, Mar 31, 2023 at 03:14:30PM -0700, Rob Clark wrote:
> > On Fri, Mar 31, 2023 at 1:44 PM Nathan Chancellor  wrote:
> > >
> > > Hi Rob,
> > >
> > > On Wed, Mar 08, 2023 at 07:53:02AM -0800, Rob Clark wrote:
> > > > From: Rob Clark 
> > > >
> > > > For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> > > > the next vblank time, and inform the fence(s) of that deadline.
> > > >
> > > > v2: Comment typo fix (danvet)
> > > > v3: If there are multiple CRTCs, consider the time of the soonest vblank
> > > >
> > > > Signed-off-by: Rob Clark 
> > > > Reviewed-by: Daniel Vetter 
> > > > Signed-off-by: Rob Clark 
> > >
> > > I apologize if this has already been reported or fixed, I searched lore
> > > but did not find anything.
> > >
> > > This change as commit d39e48ca80c0 ("drm/atomic-helper: Set fence
> > > deadline for vblank") in -next causes a hang while running LTP's
> > > read_all test on /proc on my Ampere Altra system (it seems it is hanging
> > > on a pagemap file?). Additionally, I have this splat in dmesg, which
> > > seems related based on the call stack.
> >
> > Hi, I'm not familiar with this hardware.. do you know which drm driver
> > is used?  I can't tell from the call-stack.
>
> I think it is drivers/gpu/drm/ast, as I see ast in lsmod?

Ok, assuming my theory is correct, this should fix it:

https://patchwork.freedesktop.org/series/115992/

BR,
-R

> > > [   20.542591] fbcon: Taking over console
> > > [   20.550772] Unable to handle kernel NULL pointer dereference at 
> > > virtual address 0074
> > > [   20.550776] Mem abort info:
> > > [   20.550777]   ESR = 0x9604
> > > [   20.550779]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > [   20.550781]   SET = 0, FnV = 0
> > > [   20.550782]   EA = 0, S1PTW = 0
> > > [   20.550784]   FSC = 0x04: level 0 translation fault
> > > [   20.550785] Data abort info:
> > > [   20.550786]   ISV = 0, ISS = 0x0004
> > > [   20.550788]   CM = 0, WnR = 0
> > > [   20.550789] user pgtable: 4k pages, 48-bit VAs, pgdp=080009d16000
> > > [   20.550791] [0074] pgd=, 
> > > p4d=
> > > [   20.550796] Internal error: Oops: 9604 [#1] SMP
> > > [   20.550800] Modules linked in: ip6table_nat tun nft_fib_inet 
> > > nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 
> > > nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack 
> > > nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr 
> > > sunrpc binfmt_misc vfat fat xfs snd_usb_audio snd_hwdep snd_usbmidi_lib 
> > > snd_seq snd_pcm snd_rawmidi snd_timer snd_seq_device snd soundcore joydev 
> > > mc ipmi_ssif ipmi_devintf ipmi_msghandler arm_spe_pmu arm_cmn arm_dsu_pmu 
> > > arm_dmc620_pmu cppc_cpufreq loop zram crct10dif_ce polyval_ce nvme 
> > > polyval_generic ghash_ce sbsa_gwdt igb nvme_core ast nvme_common 
> > > i2c_algo_bit xgene_hwmon gpio_dwapb scsi_dh_rdac scsi_dh_emc scsi_dh_alua 
> > > ip6_tables ip_tables dm_multipath fuse
> > > [   20.550869] CPU: 12 PID: 469 Comm: kworker/12:1 Not tainted 
> > > 6.3.0-rc2-8-gd39e48ca80c0 #1
> > > [   20.550872] Hardware name: ADLINK AVA Developer Platform/AVA Developer 
> > > Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
> > > [   20.550875] Workqueue: events fbcon_register_existing_fbs
> > > [   20.550884] pstate: 2049 (nzCv daif +PAN -UAO -TCO -DIT -SSBS 
> > > BTYPE=--)
> > > [   20.550888] pc : drm_crtc_next_vblank_start+0x2c/0x98
> > > [   20.550894] lr : drm_atomic_helper_wait_for_fences+0x90/0x240
> > > [   20.550898] sp : 8d583960
> > > [   20.550900] x29: 8d583960 x28: 07ff8fc187b0 x27: 
> > > 
> > > [   20.550904] x26: 07ff99c08c00 x25: 0038 x24: 
> > > 07ff99c0c000
> > > [   20.550908] x23: 0001 x22: 0038 x21: 
> > > 
> > > [   20.550912] x20: 07ff9640a280 x19:  x18: 
> > > 
> > > [   20.550915] x17:  x16: b24d2eece1c0 x15: 
> > > 003038303178
> > > [   20.550919] x14: 303239310048 x13:  x12: 
> > > 00

Re: [Intel-gfx] [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

2023-03-31 Thread Rob Clark

On Fri, Mar 31, 2023 at 1:44 PM Nathan Chancellor  wrote:
>
> Hi Rob,
>
> On Wed, Mar 08, 2023 at 07:53:02AM -0800, Rob Clark wrote:
> > From: Rob Clark 
> >
> > For an atomic commit updating a single CRTC (ie. a pageflip) calculate
> > the next vblank time, and inform the fence(s) of that deadline.
> >
> > v2: Comment typo fix (danvet)
> > v3: If there are multiple CRTCs, consider the time of the soonest vblank
> >
> > Signed-off-by: Rob Clark 
> > Reviewed-by: Daniel Vetter 
> > Signed-off-by: Rob Clark 
>
> I apologize if this has already been reported or fixed, I searched lore
> but did not find anything.
>
> This change as commit d39e48ca80c0 ("drm/atomic-helper: Set fence
> deadline for vblank") in -next causes a hang while running LTP's
> read_all test on /proc on my Ampere Altra system (it seems it is hanging
> on a pagemap file?). Additionally, I have this splat in dmesg, which
> seems related based on the call stack.

Hi, I'm not familiar with this hardware.. do you know which drm driver
is used?  I can't tell from the call-stack.

BR,
-R


> [   20.542591] fbcon: Taking over console
> [   20.550772] Unable to handle kernel NULL pointer dereference at virtual 
> address 0074
> [   20.550776] Mem abort info:
> [   20.550777]   ESR = 0x9604
> [   20.550779]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   20.550781]   SET = 0, FnV = 0
> [   20.550782]   EA = 0, S1PTW = 0
> [   20.550784]   FSC = 0x04: level 0 translation fault
> [   20.550785] Data abort info:
> [   20.550786]   ISV = 0, ISS = 0x0004
> [   20.550788]   CM = 0, WnR = 0
> [   20.550789] user pgtable: 4k pages, 48-bit VAs, pgdp=080009d16000
> [   20.550791] [0074] pgd=, p4d=
> [   20.550796] Internal error: Oops: 9604 [#1] SMP
> [   20.550800] Modules linked in: ip6table_nat tun nft_fib_inet nft_fib_ipv4 
> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject 
> nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill 
> ip_set nf_tables nfnetlink qrtr sunrpc binfmt_misc vfat fat xfs snd_usb_audio 
> snd_hwdep snd_usbmidi_lib snd_seq snd_pcm snd_rawmidi snd_timer 
> snd_seq_device snd soundcore joydev mc ipmi_ssif ipmi_devintf ipmi_msghandler 
> arm_spe_pmu arm_cmn arm_dsu_pmu arm_dmc620_pmu cppc_cpufreq loop zram 
> crct10dif_ce polyval_ce nvme polyval_generic ghash_ce sbsa_gwdt igb nvme_core 
> ast nvme_common i2c_algo_bit xgene_hwmon gpio_dwapb scsi_dh_rdac scsi_dh_emc 
> scsi_dh_alua ip6_tables ip_tables dm_multipath fuse
> [   20.550869] CPU: 12 PID: 469 Comm: kworker/12:1 Not tainted 
> 6.3.0-rc2-8-gd39e48ca80c0 #1
> [   20.550872] Hardware name: ADLINK AVA Developer Platform/AVA Developer 
> Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
> [   20.550875] Workqueue: events fbcon_register_existing_fbs
> [   20.550884] pstate: 2049 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   20.550888] pc : drm_crtc_next_vblank_start+0x2c/0x98
> [   20.550894] lr : drm_atomic_helper_wait_for_fences+0x90/0x240
> [   20.550898] sp : 8d583960
> [   20.550900] x29: 8d583960 x28: 07ff8fc187b0 x27: 
> 
> [   20.550904] x26: 07ff99c08c00 x25: 0038 x24: 
> 07ff99c0c000
> [   20.550908] x23: 0001 x22: 0038 x21: 
> 
> [   20.550912] x20: 07ff9640a280 x19:  x18: 
> 
> [   20.550915] x17:  x16: b24d2eece1c0 x15: 
> 003038303178
> [   20.550919] x14: 303239310048 x13:  x12: 
> 
> [   20.550923] x11:  x10:  x9 : 
> b24d2eeeaca0
> [   20.550926] x8 : 8d583628 x7 : 080077783000 x6 : 
> 
> [   20.550930] x5 : 8d584000 x4 : 07ff99c0c000 x3 : 
> 0130
> [   20.550934] x2 :  x1 : 8d5839c0 x0 : 
> 07ff99c0cc08
> [   20.550937] Call trace:
> [   20.550939]  drm_crtc_next_vblank_start+0x2c/0x98
> [   20.550942]  drm_atomic_helper_wait_for_fences+0x90/0x240
> [   20.550946]  drm_atomic_helper_commit+0xb0/0x188
> [   20.550949]  drm_atomic_commit+0xb0/0xf0
> [   20.550953]  drm_client_modeset_commit_atomic+0x218/0x280
> [   20.550957]  drm_client_modeset_commit_locked+0x64/0x1a0
> [   20.550961]  drm_client_modeset_commit+0x38/0x68
> [   20.550965]  __drm_fb_helper_restore_fbdev_mode_unlocked+0xb0/0xf8
> [   20.550970]  drm_fb_helper_set_par+0x44/0x88
> [   20.550973]  fbcon_init+0x1e0/0x4a8
> [   20.550976]  visual_init+0xbc/0x118
> [   20.550981]  do_bind_con_driver.isra.0+0x194/0x3

Re: [Intel-gfx] [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

2023-03-21 Thread Rob Clark

On Tue, Mar 21, 2023 at 6:24 AM Jonas Ådahl  wrote:
>
> On Fri, Mar 17, 2023 at 08:59:48AM -0700, Rob Clark wrote:
> > On Fri, Mar 17, 2023 at 3:23 AM Jonas Ådahl  wrote:
> > >
> > > On Thu, Mar 16, 2023 at 09:28:55AM -0700, Rob Clark wrote:
> > > > On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl  wrote:
> > > > >
> > > > > On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > > > > > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > > > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > > > > > From: Rob Clark 
> > > > > > > > > >
> > > > > > > > > > Add a way to hint to the fence signaler of an upcoming 
> > > > > > > > > > deadline, such as
> > > > > > > > > > vblank, which the fence waiter would prefer not to miss.  
> > > > > > > > > > This is to aid
> > > > > > > > > > the fence signaler in making power management decisions, 
> > > > > > > > > > like boosting
> > > > > > > > > > frequency as the deadline approaches and awareness of 
> > > > > > > > > > missing deadlines
> > > > > > > > > > so that can be factored in to the frequency scaling.
> > > > > > > > > >
> > > > > > > > > > v2: Drop dma_fence::deadline and related logic to filter 
> > > > > > > > > > duplicate
> > > > > > > > > > deadlines, to avoid increasing dma_fence size.  The 
> > > > > > > > > > fence-context
> > > > > > > > > > implementation will need similar logic to track 
> > > > > > > > > > deadlines of all
> > > > > > > > > > the fences on the same timeline.  [ckoenig]
> > > > > > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > > > > > v6: More docs
> > > > > > > > > > v7: Fix typo, clarify past deadlines
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Rob Clark 
> > > > > > > > > > Reviewed-by: Christian König 
> > > > > > > > > > Acked-by: Pekka Paalanen 
> > > > > > > > > > Reviewed-by: Bagas Sanjaya 
> > > > > > > > > > ---
> > > > > > > > >
> > > > > > > > > Hi Rob!
> > > > > > > > >
> > > > > > > > > >  Documentation/driver-api/dma-buf.rst |  6 +++
> > > > > > > > > >  drivers/dma-buf/dma-fence.c  | 59 
> > > > > > > > > > 
> > > > > > > > > >  include/linux/dma-fence.h| 22 +++
> > > > > > > > > >  3 files changed, 87 insertions(+)
> > > > > > > > > >
> > > > > > > > > > diff --git a/Documentation/driver-api/dma-buf.rst 
> > > > > > > > > > b/Documentation/driver-api/dma-buf.rst
> > > > > > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > > > > > >  .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > > > :doc: fence signalling annotation
> > > > > > > > > >
> > > > > > > > > > +DMA Fence Deadline Hints
> > > > > > > > &g

Re: [Intel-gfx] [PATCH v10 09/15] drm/syncobj: Add deadline support for syncobj waits

2023-03-18 Thread Rob Clark

On Fri, Mar 17, 2023 at 12:08 PM Faith Ekstrand  wrote:
>
>
> On Wed, Mar 8, 2023 at 9:54 AM Rob Clark  wrote:
>>
>> From: Rob Clark 
>>
>> Add a new flag to let userspace provide a deadline as a hint for syncobj
>> and timeline waits.  This gives a hint to the driver signaling the
>> backing fences about how soon userspace needs it to compete work, so it
>> can addjust GPU frequency accordingly.  An immediate deadline can be
>> given to provide something equivalent to i915 "wait boost".
>>
>> v2: Use absolute u64 ns value for deadline hint, drop cap and driver
>> feature flag in favor of allowing count_handles==0 as a way for
>> userspace to probe kernel for support of new flag
>> v3: More verbose comments about UAPI
>>
>> Signed-off-by: Rob Clark 
>> ---
>>  drivers/gpu/drm/drm_syncobj.c | 64 ---
>>  include/uapi/drm/drm.h| 17 ++
>>  2 files changed, 68 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>> index 0c2be8360525..a85e9464f07b 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -126,6 +126,11 @@
>>   * synchronize between the two.
>>   * This requirement is inherited from the Vulkan fence API.
>>   *
>> + * If _SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE is set, the ioctl will also set
>> + * a fence deadline hint on the backing fences before waiting, to provide 
>> the
>> + * fence signaler with an appropriate sense of urgency.  The deadline is
>> + * specified as an absolute _MONOTONIC value in units of ns.
>> + *
>>   * Similarly, _IOCTL_SYNCOBJ_TIMELINE_WAIT takes an array of syncobj
>>   * handles as well as an array of u64 points and does a host-side wait on 
>> all
>>   * of syncobj fences at the given points simultaneously.
>> @@ -973,7 +978,8 @@ static signed long drm_syncobj_array_wait_timeout(struct 
>> drm_syncobj **syncobjs,
>>   uint32_t count,
>>   uint32_t flags,
>>   signed long timeout,
>> - uint32_t *idx)
>> + uint32_t *idx,
>> + ktime_t *deadline)
>>  {
>> struct syncobj_wait_entry *entries;
>> struct dma_fence *fence;
>> @@ -1053,6 +1059,15 @@ static signed long 
>> drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
>> drm_syncobj_fence_add_wait(syncobjs[i], [i]);
>> }
>>
>> +   if (deadline) {
>> +   for (i = 0; i < count; ++i) {
>> +   fence = entries[i].fence;
>> +   if (!fence)
>> +   continue;
>> +   dma_fence_set_deadline(fence, *deadline);
>> +   }
>> +   }
>> +
>> do {
>> set_current_state(TASK_INTERRUPTIBLE);
>>
>> @@ -1151,7 +1166,8 @@ static int drm_syncobj_array_wait(struct drm_device 
>> *dev,
>>   struct drm_file *file_private,
>>   struct drm_syncobj_wait *wait,
>>   struct drm_syncobj_timeline_wait 
>> *timeline_wait,
>> - struct drm_syncobj **syncobjs, bool 
>> timeline)
>> + struct drm_syncobj **syncobjs, bool 
>> timeline,
>> + ktime_t *deadline)
>>  {
>> signed long timeout = 0;
>> uint32_t first = ~0;
>> @@ -1162,7 +1178,8 @@ static int drm_syncobj_array_wait(struct drm_device 
>> *dev,
>>  NULL,
>>  wait->count_handles,
>>  wait->flags,
>> -timeout, );
>> +timeout, ,
>> +deadline);
>> if (timeout < 0)
>> return timeout;
>> wait->first_signaled = first;
>> @@ -1172,7 +1189,8 @@ static int drm_syncobj_array_wait(struct drm_device 
>> *dev,
>>

Re: [Intel-gfx] [PATCH v10 09/15] drm/syncobj: Add deadline support for syncobj waits

2023-03-17 Thread Rob Clark

On Fri, Mar 17, 2023 at 12:08 PM Faith Ekstrand  wrote:
>
>
> On Wed, Mar 8, 2023 at 9:54 AM Rob Clark  wrote:
>>
>> From: Rob Clark 
>>
>> Add a new flag to let userspace provide a deadline as a hint for syncobj
>> and timeline waits.  This gives a hint to the driver signaling the
>> backing fences about how soon userspace needs it to compete work, so it
>> can addjust GPU frequency accordingly.  An immediate deadline can be
>> given to provide something equivalent to i915 "wait boost".
>>
>> v2: Use absolute u64 ns value for deadline hint, drop cap and driver
>> feature flag in favor of allowing count_handles==0 as a way for
>> userspace to probe kernel for support of new flag
>> v3: More verbose comments about UAPI
>>
>> Signed-off-by: Rob Clark 
>> ---
>>  drivers/gpu/drm/drm_syncobj.c | 64 ---
>>  include/uapi/drm/drm.h| 17 ++
>>  2 files changed, 68 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>> index 0c2be8360525..a85e9464f07b 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -126,6 +126,11 @@
>>   * synchronize between the two.
>>   * This requirement is inherited from the Vulkan fence API.
>>   *
>> + * If _SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE is set, the ioctl will also set
>> + * a fence deadline hint on the backing fences before waiting, to provide 
>> the
>> + * fence signaler with an appropriate sense of urgency.  The deadline is
>> + * specified as an absolute _MONOTONIC value in units of ns.
>> + *
>>   * Similarly, _IOCTL_SYNCOBJ_TIMELINE_WAIT takes an array of syncobj
>>   * handles as well as an array of u64 points and does a host-side wait on 
>> all
>>   * of syncobj fences at the given points simultaneously.
>> @@ -973,7 +978,8 @@ static signed long drm_syncobj_array_wait_timeout(struct 
>> drm_syncobj **syncobjs,
>>   uint32_t count,
>>   uint32_t flags,
>>   signed long timeout,
>> - uint32_t *idx)
>> + uint32_t *idx,
>> + ktime_t *deadline)
>>  {
>> struct syncobj_wait_entry *entries;
>> struct dma_fence *fence;
>> @@ -1053,6 +1059,15 @@ static signed long 
>> drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
>> drm_syncobj_fence_add_wait(syncobjs[i], [i]);
>> }
>>
>> +   if (deadline) {
>> +   for (i = 0; i < count; ++i) {
>> +   fence = entries[i].fence;
>> +   if (!fence)
>> +   continue;
>> +   dma_fence_set_deadline(fence, *deadline);
>> +   }
>> +   }
>> +
>> do {
>> set_current_state(TASK_INTERRUPTIBLE);
>>
>> @@ -1151,7 +1166,8 @@ static int drm_syncobj_array_wait(struct drm_device 
>> *dev,
>>   struct drm_file *file_private,
>>   struct drm_syncobj_wait *wait,
>>   struct drm_syncobj_timeline_wait 
>> *timeline_wait,
>> - struct drm_syncobj **syncobjs, bool 
>> timeline)
>> + struct drm_syncobj **syncobjs, bool 
>> timeline,
>> + ktime_t *deadline)
>>  {
>> signed long timeout = 0;
>> uint32_t first = ~0;
>> @@ -1162,7 +1178,8 @@ static int drm_syncobj_array_wait(struct drm_device 
>> *dev,
>>  NULL,
>>  wait->count_handles,
>>  wait->flags,
>> -timeout, );
>> +timeout, ,
>> +deadline);
>> if (timeout < 0)
>> return timeout;
>> wait->first_signaled = first;
>> @@ -1172,7 +1189,8 @@ static int drm_syncobj_array_wait(struct drm_device 
>> *dev,
>>

Re: [Intel-gfx] [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

2023-03-17 Thread Rob Clark

On Fri, Mar 17, 2023 at 3:23 AM Jonas Ådahl  wrote:
>
> On Thu, Mar 16, 2023 at 09:28:55AM -0700, Rob Clark wrote:
> > On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl  wrote:
> > >
> > > On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > > > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl  wrote:
> > > > >
> > > > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > > > From: Rob Clark 
> > > > > > > >
> > > > > > > > Add a way to hint to the fence signaler of an upcoming 
> > > > > > > > deadline, such as
> > > > > > > > vblank, which the fence waiter would prefer not to miss.  This 
> > > > > > > > is to aid
> > > > > > > > the fence signaler in making power management decisions, like 
> > > > > > > > boosting
> > > > > > > > frequency as the deadline approaches and awareness of missing 
> > > > > > > > deadlines
> > > > > > > > so that can be factored in to the frequency scaling.
> > > > > > > >
> > > > > > > > v2: Drop dma_fence::deadline and related logic to filter 
> > > > > > > > duplicate
> > > > > > > > deadlines, to avoid increasing dma_fence size.  The 
> > > > > > > > fence-context
> > > > > > > > implementation will need similar logic to track deadlines 
> > > > > > > > of all
> > > > > > > > the fences on the same timeline.  [ckoenig]
> > > > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > > > v6: More docs
> > > > > > > > v7: Fix typo, clarify past deadlines
> > > > > > > >
> > > > > > > > Signed-off-by: Rob Clark 
> > > > > > > > Reviewed-by: Christian König 
> > > > > > > > Acked-by: Pekka Paalanen 
> > > > > > > > Reviewed-by: Bagas Sanjaya 
> > > > > > > > ---
> > > > > > >
> > > > > > > Hi Rob!
> > > > > > >
> > > > > > > >  Documentation/driver-api/dma-buf.rst |  6 +++
> > > > > > > >  drivers/dma-buf/dma-fence.c  | 59 
> > > > > > > > 
> > > > > > > >  include/linux/dma-fence.h| 22 +++
> > > > > > > >  3 files changed, 87 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/Documentation/driver-api/dma-buf.rst 
> > > > > > > > b/Documentation/driver-api/dma-buf.rst
> > > > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > > > >  .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > :doc: fence signalling annotation
> > > > > > > >
> > > > > > > > +DMA Fence Deadline Hints
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > +   :doc: deadline hints
> > > > > > > > +
> > > > > > > >  DMA Fences Functions Reference
> > > > > > > >  ~~
> > > > > > > >
> > > > > > > > diff --git a/drivers/dma-buf/dma-fence.c 
> > > > > > > > b/drivers/dma-buf/dma-fence.c
> > > > > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > > > &

Re: [Intel-gfx] [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

2023-03-16 Thread Rob Clark

On Thu, Mar 16, 2023 at 3:22 PM Sebastian Wick
 wrote:
>
> On Thu, Mar 16, 2023 at 5:29 PM Rob Clark  wrote:
> >
> > On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl  wrote:
> > >
> > > On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > > > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl  wrote:
> > > > >
> > > > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > > > From: Rob Clark 
> > > > > > > >
> > > > > > > > Add a way to hint to the fence signaler of an upcoming 
> > > > > > > > deadline, such as
> > > > > > > > vblank, which the fence waiter would prefer not to miss.  This 
> > > > > > > > is to aid
> > > > > > > > the fence signaler in making power management decisions, like 
> > > > > > > > boosting
> > > > > > > > frequency as the deadline approaches and awareness of missing 
> > > > > > > > deadlines
> > > > > > > > so that can be factored in to the frequency scaling.
> > > > > > > >
> > > > > > > > v2: Drop dma_fence::deadline and related logic to filter 
> > > > > > > > duplicate
> > > > > > > > deadlines, to avoid increasing dma_fence size.  The 
> > > > > > > > fence-context
> > > > > > > > implementation will need similar logic to track deadlines 
> > > > > > > > of all
> > > > > > > > the fences on the same timeline.  [ckoenig]
> > > > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > > > v6: More docs
> > > > > > > > v7: Fix typo, clarify past deadlines
> > > > > > > >
> > > > > > > > Signed-off-by: Rob Clark 
> > > > > > > > Reviewed-by: Christian König 
> > > > > > > > Acked-by: Pekka Paalanen 
> > > > > > > > Reviewed-by: Bagas Sanjaya 
> > > > > > > > ---
> > > > > > >
> > > > > > > Hi Rob!
> > > > > > >
> > > > > > > >  Documentation/driver-api/dma-buf.rst |  6 +++
> > > > > > > >  drivers/dma-buf/dma-fence.c  | 59 
> > > > > > > > 
> > > > > > > >  include/linux/dma-fence.h| 22 +++
> > > > > > > >  3 files changed, 87 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/Documentation/driver-api/dma-buf.rst 
> > > > > > > > b/Documentation/driver-api/dma-buf.rst
> > > > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > > > >  .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > :doc: fence signalling annotation
> > > > > > > >
> > > > > > > > +DMA Fence Deadline Hints
> > > > > > > > +
> > > > > > > > +
> > > > > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > > > +   :doc: deadline hints
> > > > > > > > +
> > > > > > > >  DMA Fences Functions Reference
> > > > > > > >  ~~
> > > > > > > >
> > > > > > > > diff --git a/drivers/dma-buf/dma-fence.c 
> > > > > > > > b/drivers/dma-buf/dma-fence.c
> > > > > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > >

Re: [Intel-gfx] [PATCH v10 00/15] dma-fence: Deadline awareness

2023-03-16 Thread Rob Clark

On Wed, Mar 8, 2023 at 7:53 AM Rob Clark  wrote:
>
> From: Rob Clark 
>
> This series adds a deadline hint to fences, so realtime deadlines
> such as vblank can be communicated to the fence signaller for power/
> frequency management decisions.
>
> This is partially inspired by a trick i915 does, but implemented
> via dma-fence for a couple of reasons:
>
> 1) To continue to be able to use the atomic helpers
> 2) To support cases where display and gpu are different drivers
>
> This iteration adds a dma-fence ioctl to set a deadline (both to
> support igt-tests, and compositors which delay decisions about which
> client buffer to display), and a sw_sync ioctl to read back the
> deadline.  IGT tests utilizing these can be found at:
>
>   
> https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline
>

jfwiw, mesa side of this:

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21973

BR,
-R

>
> v1: https://patchwork.freedesktop.org/series/93035/
> v2: Move filtering out of later deadlines to fence implementation
> to avoid increasing the size of dma_fence
> v3: Add support in fence-array and fence-chain; Add some uabi to
> support igt tests and userspace compositors.
> v4: Rebase, address various comments, and add syncobj deadline
> support, and sync_file EPOLLPRI based on experience with perf/
> freq issues with clvk compute workloads on i915 (anv)
> v5: Clarify that this is a hint as opposed to a more hard deadline
> guarantee, switch to using u64 ns values in UABI (still absolute
> CLOCK_MONOTONIC values), drop syncobj related cap and driver
> feature flag in favor of allowing count_handles==0 for probing
> kernel support.
> v6: Re-work vblank helper to calculate time of _start_ of vblank,
> and work correctly if the last vblank event was more than a
> frame ago.  Add (mostly unrelated) drm/msm patch which also
> uses the vblank helper.  Use dma_fence_chain_contained().  More
> verbose syncobj UABI comments.  Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> v7: Fix kbuild complaints about vblank helper.  Add more docs.
> v8: Add patch to surface sync_file UAPI, and more docs updates.
> v9: Drop (E)POLLPRI support.. I still like it, but not essential and
> it can always be revived later.  Fix doc build warning.
> v10: Update 11/15 to handle multiple CRTCs
>
> Rob Clark (15):
>   dma-buf/dma-fence: Add deadline awareness
>   dma-buf/fence-array: Add fence deadline support
>   dma-buf/fence-chain: Add fence deadline support
>   dma-buf/dma-resv: Add a way to set fence deadline
>   dma-buf/sync_file: Surface sync-file uABI
>   dma-buf/sync_file: Add SET_DEADLINE ioctl
>   dma-buf/sw_sync: Add fence deadline support
>   drm/scheduler: Add fence deadline support
>   drm/syncobj: Add deadline support for syncobj waits
>   drm/vblank: Add helper to get next vblank time
>   drm/atomic-helper: Set fence deadline for vblank
>   drm/msm: Add deadline based boost support
>   drm/msm: Add wait-boost support
>   drm/msm/atomic: Switch to vblank_start helper
>   drm/i915: Add deadline based boost support
>
> Rob Clark (15):
>   dma-buf/dma-fence: Add deadline awareness
>   dma-buf/fence-array: Add fence deadline support
>   dma-buf/fence-chain: Add fence deadline support
>   dma-buf/dma-resv: Add a way to set fence deadline
>   dma-buf/sync_file: Surface sync-file uABI
>   dma-buf/sync_file: Add SET_DEADLINE ioctl
>   dma-buf/sw_sync: Add fence deadline support
>   drm/scheduler: Add fence deadline support
>   drm/syncobj: Add deadline support for syncobj waits
>   drm/vblank: Add helper to get next vblank time
>   drm/atomic-helper: Set fence deadline for vblank
>   drm/msm: Add deadline based boost support
>   drm/msm: Add wait-boost support
>   drm/msm/atomic: Switch to vblank_start helper
>   drm/i915: Add deadline based boost support
>
>  Documentation/driver-api/dma-buf.rst| 16 -
>  drivers/dma-buf/dma-fence-array.c   | 11 
>  drivers/dma-buf/dma-fence-chain.c   | 12 
>  drivers/dma-buf/dma-fence.c | 60 ++
>  drivers/dma-buf/dma-resv.c  | 22 +++
>  drivers/dma-buf/sw_sync.c   | 81 +
>  drivers/dma-buf/sync_debug.h|  2 +
>  drivers/dma-buf/sync_file.c | 19 ++
>  drivers/gpu/drm/drm_atomic_helper.c | 37 +++
>  drivers/gpu/drm/drm_syncobj.c   | 64 +++
>  drivers/gpu/drm/drm_vblank.c| 53 +---
>  drivers/gpu/drm/i915/i915_request.c | 20 ++
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 -
>  drivers/gpu/drm/msm/msm_atomic.c|  8 ++-
>  drivers/gpu/drm/msm/ms

Re: [Intel-gfx] [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

2023-03-16 Thread Rob Clark

On Thu, Mar 16, 2023 at 2:26 AM Jonas Ådahl  wrote:
>
> On Wed, Mar 15, 2023 at 09:19:49AM -0700, Rob Clark wrote:
> > On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl  wrote:
> > >
> > > On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > > > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl  wrote:
> > > > >
> > > > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > > > From: Rob Clark 
> > > > > >
> > > > > > Add a way to hint to the fence signaler of an upcoming deadline, 
> > > > > > such as
> > > > > > vblank, which the fence waiter would prefer not to miss.  This is 
> > > > > > to aid
> > > > > > the fence signaler in making power management decisions, like 
> > > > > > boosting
> > > > > > frequency as the deadline approaches and awareness of missing 
> > > > > > deadlines
> > > > > > so that can be factored in to the frequency scaling.
> > > > > >
> > > > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > > > deadlines, to avoid increasing dma_fence size.  The 
> > > > > > fence-context
> > > > > > implementation will need similar logic to track deadlines of all
> > > > > > the fences on the same timeline.  [ckoenig]
> > > > > > v3: Clarify locking wrt. set_deadline callback
> > > > > > v4: Clarify in docs comment that this is a hint
> > > > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > > > v6: More docs
> > > > > > v7: Fix typo, clarify past deadlines
> > > > > >
> > > > > > Signed-off-by: Rob Clark 
> > > > > > Reviewed-by: Christian König 
> > > > > > Acked-by: Pekka Paalanen 
> > > > > > Reviewed-by: Bagas Sanjaya 
> > > > > > ---
> > > > >
> > > > > Hi Rob!
> > > > >
> > > > > >  Documentation/driver-api/dma-buf.rst |  6 +++
> > > > > >  drivers/dma-buf/dma-fence.c  | 59 
> > > > > > 
> > > > > >  include/linux/dma-fence.h| 22 +++
> > > > > >  3 files changed, 87 insertions(+)
> > > > > >
> > > > > > diff --git a/Documentation/driver-api/dma-buf.rst 
> > > > > > b/Documentation/driver-api/dma-buf.rst
> > > > > > index 622b8156d212..183e480d8cea 100644
> > > > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > > > >  .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > :doc: fence signalling annotation
> > > > > >
> > > > > > +DMA Fence Deadline Hints
> > > > > > +
> > > > > > +
> > > > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > > > +   :doc: deadline hints
> > > > > > +
> > > > > >  DMA Fences Functions Reference
> > > > > >  ~~
> > > > > >
> > > > > > diff --git a/drivers/dma-buf/dma-fence.c 
> > > > > > b/drivers/dma-buf/dma-fence.c
> > > > > > index 0de0482cd36e..f177c56269bb 100644
> > > > > > --- a/drivers/dma-buf/dma-fence.c
> > > > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence 
> > > > > > **fences, uint32_t count,
> > > > > >  }
> > > > > >  EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > > > >
> > > > > > +/**
> > > > > > + * DOC: deadline hints
> > > > > > + *
> > > > > > + * In an ideal world, it would be possible to pipeline a workload 
> > > > > > sufficiently
> > > > > > + * that a utilization based device frequency governor could arrive 
> > > > > > at a minimum
> > > > > > + * frequency that meets the requirements of the use-case, in order 
> > > > > > to minimize
> > > > > > + * power consumption.  But

Re: [Intel-gfx] [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

2023-03-15 Thread Rob Clark

On Wed, Mar 15, 2023 at 6:53 AM Jonas Ådahl  wrote:
>
> On Fri, Mar 10, 2023 at 09:38:18AM -0800, Rob Clark wrote:
> > On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl  wrote:
> > >
> > > On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > > > From: Rob Clark 
> > > >
> > > > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > > > vblank, which the fence waiter would prefer not to miss.  This is to aid
> > > > the fence signaler in making power management decisions, like boosting
> > > > frequency as the deadline approaches and awareness of missing deadlines
> > > > so that can be factored in to the frequency scaling.
> > > >
> > > > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > > > deadlines, to avoid increasing dma_fence size.  The fence-context
> > > > implementation will need similar logic to track deadlines of all
> > > > the fences on the same timeline.  [ckoenig]
> > > > v3: Clarify locking wrt. set_deadline callback
> > > > v4: Clarify in docs comment that this is a hint
> > > > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > > > v6: More docs
> > > > v7: Fix typo, clarify past deadlines
> > > >
> > > > Signed-off-by: Rob Clark 
> > > > Reviewed-by: Christian König 
> > > > Acked-by: Pekka Paalanen 
> > > > Reviewed-by: Bagas Sanjaya 
> > > > ---
> > >
> > > Hi Rob!
> > >
> > > >  Documentation/driver-api/dma-buf.rst |  6 +++
> > > >  drivers/dma-buf/dma-fence.c  | 59 
> > > >  include/linux/dma-fence.h| 22 +++
> > > >  3 files changed, 87 insertions(+)
> > > >
> > > > diff --git a/Documentation/driver-api/dma-buf.rst 
> > > > b/Documentation/driver-api/dma-buf.rst
> > > > index 622b8156d212..183e480d8cea 100644
> > > > --- a/Documentation/driver-api/dma-buf.rst
> > > > +++ b/Documentation/driver-api/dma-buf.rst
> > > > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> > > >  .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > :doc: fence signalling annotation
> > > >
> > > > +DMA Fence Deadline Hints
> > > > +
> > > > +
> > > > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > > > +   :doc: deadline hints
> > > > +
> > > >  DMA Fences Functions Reference
> > > >  ~~
> > > >
> > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > > > index 0de0482cd36e..f177c56269bb 100644
> > > > --- a/drivers/dma-buf/dma-fence.c
> > > > +++ b/drivers/dma-buf/dma-fence.c
> > > > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence 
> > > > **fences, uint32_t count,
> > > >  }
> > > >  EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> > > >
> > > > +/**
> > > > + * DOC: deadline hints
> > > > + *
> > > > + * In an ideal world, it would be possible to pipeline a workload 
> > > > sufficiently
> > > > + * that a utilization based device frequency governor could arrive at 
> > > > a minimum
> > > > + * frequency that meets the requirements of the use-case, in order to 
> > > > minimize
> > > > + * power consumption.  But in the real world there are many workloads 
> > > > which
> > > > + * defy this ideal.  For example, but not limited to:
> > > > + *
> > > > + * * Workloads that ping-pong between device and CPU, with alternating 
> > > > periods
> > > > + *   of CPU waiting for device, and device waiting on CPU.  This can 
> > > > result in
> > > > + *   devfreq and cpufreq seeing idle time in their respective domains 
> > > > and in
> > > > + *   result reduce frequency.
> > > > + *
> > > > + * * Workloads that interact with a periodic time based deadline, such 
> > > > as double
> > > > + *   buffered GPU rendering vs vblank sync'd page flipping.  In this 
> > > > scenario,
> > > > + *   missing a vblank deadline results in an *increase* in idle time 
> > > > on the GPU
> > > > + *   (since it has to wait an additional vblank period), sending a 
> > > >

Re: [Intel-gfx] [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

2023-03-10 Thread Rob Clark

On Fri, Mar 10, 2023 at 7:45 AM Jonas Ådahl  wrote:
>
> On Wed, Mar 08, 2023 at 07:52:52AM -0800, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Add a way to hint to the fence signaler of an upcoming deadline, such as
> > vblank, which the fence waiter would prefer not to miss.  This is to aid
> > the fence signaler in making power management decisions, like boosting
> > frequency as the deadline approaches and awareness of missing deadlines
> > so that can be factored in to the frequency scaling.
> >
> > v2: Drop dma_fence::deadline and related logic to filter duplicate
> > deadlines, to avoid increasing dma_fence size.  The fence-context
> > implementation will need similar logic to track deadlines of all
> > the fences on the same timeline.  [ckoenig]
> > v3: Clarify locking wrt. set_deadline callback
> > v4: Clarify in docs comment that this is a hint
> > v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > v6: More docs
> > v7: Fix typo, clarify past deadlines
> >
> > Signed-off-by: Rob Clark 
> > Reviewed-by: Christian König 
> > Acked-by: Pekka Paalanen 
> > Reviewed-by: Bagas Sanjaya 
> > ---
>
> Hi Rob!
>
> >  Documentation/driver-api/dma-buf.rst |  6 +++
> >  drivers/dma-buf/dma-fence.c  | 59 
> >  include/linux/dma-fence.h| 22 +++
> >  3 files changed, 87 insertions(+)
> >
> > diff --git a/Documentation/driver-api/dma-buf.rst 
> > b/Documentation/driver-api/dma-buf.rst
> > index 622b8156d212..183e480d8cea 100644
> > --- a/Documentation/driver-api/dma-buf.rst
> > +++ b/Documentation/driver-api/dma-buf.rst
> > @@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
> >  .. kernel-doc:: drivers/dma-buf/dma-fence.c
> > :doc: fence signalling annotation
> >
> > +DMA Fence Deadline Hints
> > +
> > +
> > +.. kernel-doc:: drivers/dma-buf/dma-fence.c
> > +   :doc: deadline hints
> > +
> >  DMA Fences Functions Reference
> >  ~~
> >
> > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> > index 0de0482cd36e..f177c56269bb 100644
> > --- a/drivers/dma-buf/dma-fence.c
> > +++ b/drivers/dma-buf/dma-fence.c
> > @@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, 
> > uint32_t count,
> >  }
> >  EXPORT_SYMBOL(dma_fence_wait_any_timeout);
> >
> > +/**
> > + * DOC: deadline hints
> > + *
> > + * In an ideal world, it would be possible to pipeline a workload 
> > sufficiently
> > + * that a utilization based device frequency governor could arrive at a 
> > minimum
> > + * frequency that meets the requirements of the use-case, in order to 
> > minimize
> > + * power consumption.  But in the real world there are many workloads which
> > + * defy this ideal.  For example, but not limited to:
> > + *
> > + * * Workloads that ping-pong between device and CPU, with alternating 
> > periods
> > + *   of CPU waiting for device, and device waiting on CPU.  This can 
> > result in
> > + *   devfreq and cpufreq seeing idle time in their respective domains and 
> > in
> > + *   result reduce frequency.
> > + *
> > + * * Workloads that interact with a periodic time based deadline, such as 
> > double
> > + *   buffered GPU rendering vs vblank sync'd page flipping.  In this 
> > scenario,
> > + *   missing a vblank deadline results in an *increase* in idle time on 
> > the GPU
> > + *   (since it has to wait an additional vblank period), sending a signal 
> > to
> > + *   the GPU's devfreq to reduce frequency, when in fact the opposite is 
> > what is
> > + *   needed.
>
> This is the use case I'd like to get some better understanding about how
> this series intends to work, as the problematic scheduling behavior
> triggered by missed deadlines has plagued compositing display servers
> for a long time.
>
> I apologize, I'm not a GPU driver developer, nor an OpenGL driver
> developer, so I will need some hand holding when it comes to
> understanding exactly what piece of software is responsible for
> communicating what piece of information.
>
> > + *
> > + * To this end, deadline hint(s) can be set on a _fence via 
> > _fence_set_deadline.
> > + * The deadline hint provides a way for the waiting driver, or userspace, 
> > to
> > + * convey an appropriate sense of urgency to the signaling driver.
> > + *
> > + * A deadline hint is given in absolute ktime (CLOCK_MONO

[Intel-gfx] [PATCH v10 15/15] drm/i915: Add deadline based boost support

2023-03-08 Thread Rob Clark

From: Rob Clark 

I expect this patch to be replaced by someone who knows i915 better.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/i915_request.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 7503dcb9043b..44491e7e214c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence 
*fence)
return i915_request_enable_breadcrumb(to_request(fence));
 }
 
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+   struct i915_request *rq = to_request(fence);
+
+   if (i915_request_completed(rq))
+   return;
+
+   if (i915_request_started(rq))
+   return;
+
+   /*
+* TODO something more clever for deadlines that are in the
+* future.  I think probably track the nearest deadline in
+* rq->timeline and set timer to trigger boost accordingly?
+*/
+
+   intel_rps_boost(rq);
+}
+
 static signed long i915_fence_wait(struct dma_fence *fence,
   bool interruptible,
   signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = {
.signaled = i915_fence_signaled,
.wait = i915_fence_wait,
.release = i915_fence_release,
+   .set_deadline = i915_fence_set_deadline,
 };
 
 static void irq_execute_cb(struct irq_work *wrk)
-- 
2.39.2

[Intel-gfx] [PATCH v10 13/15] drm/msm: Add wait-boost support

2023-03-08 Thread Rob Clark

From: Rob Clark 

Add a way for various userspace waits to signal urgency.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_drv.c | 12 
 drivers/gpu/drm/msm/msm_gem.c |  5 +
 include/uapi/drm/msm_drm.h| 14 --
 3 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index aca48c868c14..f6764a86b2da 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -46,6 +46,7 @@
  * - 1.8.0 - Add MSM_BO_CACHED_COHERENT for supported GPUs (a6xx)
  * - 1.9.0 - Add MSM_SUBMIT_FENCE_SN_IN
  * - 1.10.0 - Add MSM_SUBMIT_BO_NO_IMPLICIT
+ * - 1.11.0 - Add wait boost (MSM_WAIT_FENCE_BOOST, MSM_PREP_BOOST)
  */
 #define MSM_VERSION_MAJOR  1
 #define MSM_VERSION_MINOR  10
@@ -899,7 +900,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
 }
 
 static int wait_fence(struct msm_gpu_submitqueue *queue, uint32_t fence_id,
- ktime_t timeout)
+ ktime_t timeout, uint32_t flags)
 {
struct dma_fence *fence;
int ret;
@@ -929,6 +930,9 @@ static int wait_fence(struct msm_gpu_submitqueue *queue, 
uint32_t fence_id,
if (!fence)
return 0;
 
+   if (flags & MSM_WAIT_FENCE_BOOST)
+   dma_fence_set_deadline(fence, ktime_get());
+
ret = dma_fence_wait_timeout(fence, true, timeout_to_jiffies());
if (ret == 0) {
ret = -ETIMEDOUT;
@@ -949,8 +953,8 @@ static int msm_ioctl_wait_fence(struct drm_device *dev, 
void *data,
struct msm_gpu_submitqueue *queue;
int ret;
 
-   if (args->pad) {
-   DRM_ERROR("invalid pad: %08x\n", args->pad);
+   if (args->flags & ~MSM_WAIT_FENCE_FLAGS) {
+   DRM_ERROR("invalid flags: %08x\n", args->flags);
return -EINVAL;
}
 
@@ -961,7 +965,7 @@ static int msm_ioctl_wait_fence(struct drm_device *dev, 
void *data,
if (!queue)
return -ENOENT;
 
-   ret = wait_fence(queue, args->fence, to_ktime(args->timeout));
+   ret = wait_fence(queue, args->fence, to_ktime(args->timeout), 
args->flags);
 
msm_submitqueue_put(queue);
 
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 1dee0d18abbb..dd4a0d773f6e 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -846,6 +846,11 @@ int msm_gem_cpu_prep(struct drm_gem_object *obj, uint32_t 
op, ktime_t *timeout)
op & MSM_PREP_NOSYNC ? 0 : timeout_to_jiffies(timeout);
long ret;
 
+   if (op & MSM_PREP_BOOST) {
+   dma_resv_set_deadline(obj->resv, dma_resv_usage_rw(write),
+ ktime_get());
+   }
+
ret = dma_resv_wait_timeout(obj->resv, dma_resv_usage_rw(write),
true,  remain);
if (ret == 0)
diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
index 329100016e7c..dbf0d6f43fa9 100644
--- a/include/uapi/drm/msm_drm.h
+++ b/include/uapi/drm/msm_drm.h
@@ -151,8 +151,13 @@ struct drm_msm_gem_info {
 #define MSM_PREP_READ0x01
 #define MSM_PREP_WRITE   0x02
 #define MSM_PREP_NOSYNC  0x04
+#define MSM_PREP_BOOST   0x08
 
-#define MSM_PREP_FLAGS   (MSM_PREP_READ | MSM_PREP_WRITE | MSM_PREP_NOSYNC)
+#define MSM_PREP_FLAGS   (MSM_PREP_READ | \
+ MSM_PREP_WRITE | \
+ MSM_PREP_NOSYNC | \
+ MSM_PREP_BOOST | \
+ 0)
 
 struct drm_msm_gem_cpu_prep {
__u32 handle; /* in */
@@ -286,6 +291,11 @@ struct drm_msm_gem_submit {
 
 };
 
+#define MSM_WAIT_FENCE_BOOST   0x0001
+#define MSM_WAIT_FENCE_FLAGS   ( \
+   MSM_WAIT_FENCE_BOOST | \
+   0)
+
 /* The normal way to synchronize with the GPU is just to CPU_PREP on
  * a buffer if you need to access it from the CPU (other cmdstream
  * submission from same or other contexts, PAGE_FLIP ioctl, etc, all
@@ -295,7 +305,7 @@ struct drm_msm_gem_submit {
  */
 struct drm_msm_wait_fence {
__u32 fence;  /* in */
-   __u32 pad;
+   __u32 flags;  /* in, bitmask of MSM_WAIT_FENCE_x */
struct drm_msm_timespec timeout;   /* in */
__u32 queueid; /* in, submitqueue id */
 };
-- 
2.39.2

[Intel-gfx] [PATCH v10 14/15] drm/msm/atomic: Switch to vblank_start helper

2023-03-08 Thread Rob Clark

From: Rob Clark 

Drop our custom thing and switch to drm_crtc_next_vblank_start() for
calculating the time of the start of the next vblank period.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 ---
 drivers/gpu/drm/msm/msm_atomic.c|  8 +---
 drivers/gpu/drm/msm/msm_kms.h   |  8 
 3 files changed, 5 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index a683bd9b5a04..43996aecaf8c 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -411,20 +411,6 @@ static void dpu_kms_disable_commit(struct msm_kms *kms)
pm_runtime_put_sync(_kms->pdev->dev);
 }
 
-static ktime_t dpu_kms_vsync_time(struct msm_kms *kms, struct drm_crtc *crtc)
-{
-   struct drm_encoder *encoder;
-
-   drm_for_each_encoder_mask(encoder, crtc->dev, 
crtc->state->encoder_mask) {
-   ktime_t vsync_time;
-
-   if (dpu_encoder_vsync_time(encoder, _time) == 0)
-   return vsync_time;
-   }
-
-   return ktime_get();
-}
-
 static void dpu_kms_prepare_commit(struct msm_kms *kms,
struct drm_atomic_state *state)
 {
@@ -953,7 +939,6 @@ static const struct msm_kms_funcs kms_funcs = {
.irq = dpu_core_irq,
.enable_commit   = dpu_kms_enable_commit,
.disable_commit  = dpu_kms_disable_commit,
-   .vsync_time  = dpu_kms_vsync_time,
.prepare_commit  = dpu_kms_prepare_commit,
.flush_commit= dpu_kms_flush_commit,
.wait_flush  = dpu_kms_wait_flush,
diff --git a/drivers/gpu/drm/msm/msm_atomic.c b/drivers/gpu/drm/msm/msm_atomic.c
index 1686fbb611fd..c5e71c05f038 100644
--- a/drivers/gpu/drm/msm/msm_atomic.c
+++ b/drivers/gpu/drm/msm/msm_atomic.c
@@ -186,8 +186,7 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state)
struct msm_kms *kms = priv->kms;
struct drm_crtc *async_crtc = NULL;
unsigned crtc_mask = get_crtc_mask(state);
-   bool async = kms->funcs->vsync_time &&
-   can_do_async(state, _crtc);
+   bool async = can_do_async(state, _crtc);
 
trace_msm_atomic_commit_tail_start(async, crtc_mask);
 
@@ -231,7 +230,9 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state)
 
kms->pending_crtc_mask |= crtc_mask;
 
-   vsync_time = kms->funcs->vsync_time(kms, async_crtc);
+   if (drm_crtc_next_vblank_start(async_crtc, _time))
+   goto fallback;
+
wakeup_time = ktime_sub(vsync_time, ms_to_ktime(1));
 
msm_hrtimer_queue_work(>work, wakeup_time,
@@ -253,6 +254,7 @@ void msm_atomic_commit_tail(struct drm_atomic_state *state)
return;
}
 
+fallback:
/*
 * If there is any async flush pending on updated crtcs, fold
 * them into the current flush.
diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h
index f8ed7588928c..086a3f1ff956 100644
--- a/drivers/gpu/drm/msm/msm_kms.h
+++ b/drivers/gpu/drm/msm/msm_kms.h
@@ -59,14 +59,6 @@ struct msm_kms_funcs {
void (*enable_commit)(struct msm_kms *kms);
void (*disable_commit)(struct msm_kms *kms);
 
-   /**
-* If the kms backend supports async commit, it should implement
-* this method to return the time of the next vsync.  This is
-* used to determine a time slightly before vsync, for the async
-* commit timer to run and complete an async commit.
-*/
-   ktime_t (*vsync_time)(struct msm_kms *kms, struct drm_crtc *crtc);
-
/**
 * Prepare for atomic commit.  This is called after any previous
 * (async or otherwise) commit has completed.
-- 
2.39.2

[Intel-gfx] [PATCH v10 12/15] drm/msm: Add deadline based boost support

2023-03-08 Thread Rob Clark

From: Rob Clark 

Track the nearest deadline on a fence timeline and set a timer to expire
shortly before to trigger boost if the fence has not yet been signaled.

v2: rebase

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/msm/msm_fence.c | 74 +
 drivers/gpu/drm/msm/msm_fence.h | 20 +
 2 files changed, 94 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
index 56641408ea74..51b461f32103 100644
--- a/drivers/gpu/drm/msm/msm_fence.c
+++ b/drivers/gpu/drm/msm/msm_fence.c
@@ -8,6 +8,35 @@
 
 #include "msm_drv.h"
 #include "msm_fence.h"
+#include "msm_gpu.h"
+
+static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx)
+{
+   struct msm_drm_private *priv = fctx->dev->dev_private;
+   return priv->gpu;
+}
+
+static enum hrtimer_restart deadline_timer(struct hrtimer *t)
+{
+   struct msm_fence_context *fctx = container_of(t,
+   struct msm_fence_context, deadline_timer);
+
+   kthread_queue_work(fctx2gpu(fctx)->worker, >deadline_work);
+
+   return HRTIMER_NORESTART;
+}
+
+static void deadline_work(struct kthread_work *work)
+{
+   struct msm_fence_context *fctx = container_of(work,
+   struct msm_fence_context, deadline_work);
+
+   /* If deadline fence has already passed, nothing to do: */
+   if (msm_fence_completed(fctx, fctx->next_deadline_fence))
+   return;
+
+   msm_devfreq_boost(fctx2gpu(fctx), 2);
+}
 
 
 struct msm_fence_context *
@@ -36,6 +65,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile 
uint32_t *fenceptr,
fctx->completed_fence = fctx->last_fence;
*fctx->fenceptr = fctx->last_fence;
 
+   hrtimer_init(>deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+   fctx->deadline_timer.function = deadline_timer;
+
+   kthread_init_work(>deadline_work, deadline_work);
+
+   fctx->next_deadline = ktime_get();
+
return fctx;
 }
 
@@ -62,6 +98,8 @@ void msm_update_fence(struct msm_fence_context *fctx, 
uint32_t fence)
spin_lock_irqsave(>spinlock, flags);
if (fence_after(fence, fctx->completed_fence))
fctx->completed_fence = fence;
+   if (msm_fence_completed(fctx, fctx->next_deadline_fence))
+   hrtimer_cancel(>deadline_timer);
spin_unlock_irqrestore(>spinlock, flags);
 }
 
@@ -92,10 +130,46 @@ static bool msm_fence_signaled(struct dma_fence *fence)
return msm_fence_completed(f->fctx, f->base.seqno);
 }
 
+static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+   struct msm_fence *f = to_msm_fence(fence);
+   struct msm_fence_context *fctx = f->fctx;
+   unsigned long flags;
+   ktime_t now;
+
+   spin_lock_irqsave(>spinlock, flags);
+   now = ktime_get();
+
+   if (ktime_after(now, fctx->next_deadline) ||
+   ktime_before(deadline, fctx->next_deadline)) {
+   fctx->next_deadline = deadline;
+   fctx->next_deadline_fence =
+   max(fctx->next_deadline_fence, (uint32_t)fence->seqno);
+
+   /*
+* Set timer to trigger boost 3ms before deadline, or
+* if we are already less than 3ms before the deadline
+* schedule boost work immediately.
+*/
+   deadline = ktime_sub(deadline, ms_to_ktime(3));
+
+   if (ktime_after(now, deadline)) {
+   kthread_queue_work(fctx2gpu(fctx)->worker,
+   >deadline_work);
+   } else {
+   hrtimer_start(>deadline_timer, deadline,
+   HRTIMER_MODE_ABS);
+   }
+   }
+
+   spin_unlock_irqrestore(>spinlock, flags);
+}
+
 static const struct dma_fence_ops msm_fence_ops = {
.get_driver_name = msm_fence_get_driver_name,
.get_timeline_name = msm_fence_get_timeline_name,
.signaled = msm_fence_signaled,
+   .set_deadline = msm_fence_set_deadline,
 };
 
 struct dma_fence *
diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
index 7f1798c54cd1..cdaebfb94f5c 100644
--- a/drivers/gpu/drm/msm/msm_fence.h
+++ b/drivers/gpu/drm/msm/msm_fence.h
@@ -52,6 +52,26 @@ struct msm_fence_context {
volatile uint32_t *fenceptr;
 
spinlock_t spinlock;
+
+   /*
+* TODO this doesn't really deal with multiple deadlines, like
+* if userspace got multiple frames ahead.. OTOH atomic updates
+* don't queue, so maybe that is ok
+*/
+
+   /** next_deadline: Time of next deadline */
+   ktime_t next_deadline;
+
+   /**
+* next_deadline_fence:
+*
+* Fence value for next pending deadline.  The deadline

[Intel-gfx] [PATCH v10 09/15] drm/syncobj: Add deadline support for syncobj waits

2023-03-08 Thread Rob Clark

From: Rob Clark 

Add a new flag to let userspace provide a deadline as a hint for syncobj
and timeline waits.  This gives a hint to the driver signaling the
backing fences about how soon userspace needs it to compete work, so it
can addjust GPU frequency accordingly.  An immediate deadline can be
given to provide something equivalent to i915 "wait boost".

v2: Use absolute u64 ns value for deadline hint, drop cap and driver
feature flag in favor of allowing count_handles==0 as a way for
userspace to probe kernel for support of new flag
v3: More verbose comments about UAPI

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/drm_syncobj.c | 64 ---
 include/uapi/drm/drm.h| 17 ++
 2 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 0c2be8360525..a85e9464f07b 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -126,6 +126,11 @@
  * synchronize between the two.
  * This requirement is inherited from the Vulkan fence API.
  *
+ * If _SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE is set, the ioctl will also set
+ * a fence deadline hint on the backing fences before waiting, to provide the
+ * fence signaler with an appropriate sense of urgency.  The deadline is
+ * specified as an absolute _MONOTONIC value in units of ns.
+ *
  * Similarly, _IOCTL_SYNCOBJ_TIMELINE_WAIT takes an array of syncobj
  * handles as well as an array of u64 points and does a host-side wait on all
  * of syncobj fences at the given points simultaneously.
@@ -973,7 +978,8 @@ static signed long drm_syncobj_array_wait_timeout(struct 
drm_syncobj **syncobjs,
  uint32_t count,
  uint32_t flags,
  signed long timeout,
- uint32_t *idx)
+ uint32_t *idx,
+ ktime_t *deadline)
 {
struct syncobj_wait_entry *entries;
struct dma_fence *fence;
@@ -1053,6 +1059,15 @@ static signed long drm_syncobj_array_wait_timeout(struct 
drm_syncobj **syncobjs,
drm_syncobj_fence_add_wait(syncobjs[i], [i]);
}
 
+   if (deadline) {
+   for (i = 0; i < count; ++i) {
+   fence = entries[i].fence;
+   if (!fence)
+   continue;
+   dma_fence_set_deadline(fence, *deadline);
+   }
+   }
+
do {
set_current_state(TASK_INTERRUPTIBLE);
 
@@ -1151,7 +1166,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
  struct drm_file *file_private,
  struct drm_syncobj_wait *wait,
  struct drm_syncobj_timeline_wait 
*timeline_wait,
- struct drm_syncobj **syncobjs, bool timeline)
+ struct drm_syncobj **syncobjs, bool timeline,
+ ktime_t *deadline)
 {
signed long timeout = 0;
uint32_t first = ~0;
@@ -1162,7 +1178,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
 NULL,
 wait->count_handles,
 wait->flags,
-timeout, );
+timeout, ,
+deadline);
if (timeout < 0)
return timeout;
wait->first_signaled = first;
@@ -1172,7 +1189,8 @@ static int drm_syncobj_array_wait(struct drm_device *dev,
 
u64_to_user_ptr(timeline_wait->points),
 
timeline_wait->count_handles,
 timeline_wait->flags,
-timeout, );
+timeout, ,
+deadline);
if (timeout < 0)
return timeout;
timeline_wait->first_signaled = first;
@@ -1243,17 +1261,22 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void 
*data,
 {
struct drm_syncobj_wait *args = data;
struct drm_syncobj **syncobjs;
+   unsigned possible_flags;
+   ktime_t t, *tp = NULL;
int ret = 0;
 
if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
retur

[Intel-gfx] [PATCH v10 10/15] drm/vblank: Add helper to get next vblank time

2023-03-08 Thread Rob Clark

From: Rob Clark 

Will be used in the next commit to set a deadline on fences that an
atomic update is waiting on.

v2: Calculate time at *start* of vblank period, not end
v3: Fix kbuild complaints

Signed-off-by: Rob Clark 
Reviewed-by: Mario Kleiner 
---
 drivers/gpu/drm/drm_vblank.c | 53 ++--
 include/drm/drm_vblank.h |  1 +
 2 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index 2ff31717a3de..299fa2a19a90 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -844,10 +844,9 @@ bool drm_crtc_vblank_helper_get_vblank_timestamp(struct 
drm_crtc *crtc,
 EXPORT_SYMBOL(drm_crtc_vblank_helper_get_vblank_timestamp);
 
 /**
- * drm_get_last_vbltimestamp - retrieve raw timestamp for the most recent
- * vblank interval
- * @dev: DRM device
- * @pipe: index of CRTC whose vblank timestamp to retrieve
+ * drm_crtc_get_last_vbltimestamp - retrieve raw timestamp for the most
+ *  recent vblank interval
+ * @crtc: CRTC whose vblank timestamp to retrieve
  * @tvblank: Pointer to target time which should receive the timestamp
  * @in_vblank_irq:
  * True when called from drm_crtc_handle_vblank().  Some drivers
@@ -865,10 +864,9 @@ EXPORT_SYMBOL(drm_crtc_vblank_helper_get_vblank_timestamp);
  * True if timestamp is considered to be very precise, false otherwise.
  */
 static bool
-drm_get_last_vbltimestamp(struct drm_device *dev, unsigned int pipe,
- ktime_t *tvblank, bool in_vblank_irq)
+drm_crtc_get_last_vbltimestamp(struct drm_crtc *crtc, ktime_t *tvblank,
+  bool in_vblank_irq)
 {
-   struct drm_crtc *crtc = drm_crtc_from_index(dev, pipe);
bool ret = false;
 
/* Define requested maximum error on timestamps (nanoseconds). */
@@ -876,8 +874,6 @@ drm_get_last_vbltimestamp(struct drm_device *dev, unsigned 
int pipe,
 
/* Query driver if possible and precision timestamping enabled. */
if (crtc && crtc->funcs->get_vblank_timestamp && max_error > 0) {
-   struct drm_crtc *crtc = drm_crtc_from_index(dev, pipe);
-
ret = crtc->funcs->get_vblank_timestamp(crtc, _error,
tvblank, in_vblank_irq);
}
@@ -891,6 +887,15 @@ drm_get_last_vbltimestamp(struct drm_device *dev, unsigned 
int pipe,
return ret;
 }
 
+static bool
+drm_get_last_vbltimestamp(struct drm_device *dev, unsigned int pipe,
+ ktime_t *tvblank, bool in_vblank_irq)
+{
+   struct drm_crtc *crtc = drm_crtc_from_index(dev, pipe);
+
+   return drm_crtc_get_last_vbltimestamp(crtc, tvblank, in_vblank_irq);
+}
+
 /**
  * drm_crtc_vblank_count - retrieve "cooked" vblank counter value
  * @crtc: which counter to retrieve
@@ -980,6 +985,36 @@ u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
 }
 EXPORT_SYMBOL(drm_crtc_vblank_count_and_time);
 
+/**
+ * drm_crtc_next_vblank_start - calculate the time of the next vblank
+ * @crtc: the crtc for which to calculate next vblank time
+ * @vblanktime: pointer to time to receive the next vblank timestamp.
+ *
+ * Calculate the expected time of the start of the next vblank period,
+ * based on time of previous vblank and frame duration
+ */
+int drm_crtc_next_vblank_start(struct drm_crtc *crtc, ktime_t *vblanktime)
+{
+   unsigned int pipe = drm_crtc_index(crtc);
+   struct drm_vblank_crtc *vblank = >dev->vblank[pipe];
+   struct drm_display_mode *mode = >hwmode;
+   u64 vblank_start;
+
+   if (!vblank->framedur_ns || !vblank->linedur_ns)
+   return -EINVAL;
+
+   if (!drm_crtc_get_last_vbltimestamp(crtc, vblanktime, false))
+   return -EINVAL;
+
+   vblank_start = DIV_ROUND_DOWN_ULL(
+   (u64)vblank->framedur_ns * mode->crtc_vblank_start,
+   mode->crtc_vtotal);
+   *vblanktime  = ktime_add(*vblanktime, ns_to_ktime(vblank_start));
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_crtc_next_vblank_start);
+
 static void send_vblank_event(struct drm_device *dev,
struct drm_pending_vblank_event *e,
u64 seq, ktime_t now)
diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
index 733a3e2d1d10..7f3957943dd1 100644
--- a/include/drm/drm_vblank.h
+++ b/include/drm/drm_vblank.h
@@ -230,6 +230,7 @@ bool drm_dev_has_vblank(const struct drm_device *dev);
 u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
 u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
   ktime_t *vblanktime);
+int drm_crtc_next_vblank_start(struct drm_crtc *crtc, ktime_t *vblanktime);
 void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
   struct drm_pending_vblank_event *e);
 void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
-- 
2.39.2

[Intel-gfx] [PATCH v10 11/15] drm/atomic-helper: Set fence deadline for vblank

2023-03-08 Thread Rob Clark

From: Rob Clark 

For an atomic commit updating a single CRTC (ie. a pageflip) calculate
the next vblank time, and inform the fence(s) of that deadline.

v2: Comment typo fix (danvet)
v3: If there are multiple CRTCs, consider the time of the soonest vblank

Signed-off-by: Rob Clark 
Reviewed-by: Daniel Vetter 
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/drm_atomic_helper.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index d579fd8f7cb8..28e3f2c8917e 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1511,6 +1511,41 @@ void drm_atomic_helper_commit_modeset_enables(struct 
drm_device *dev,
 }
 EXPORT_SYMBOL(drm_atomic_helper_commit_modeset_enables);
 
+/*
+ * For atomic updates which touch just a single CRTC, calculate the time of the
+ * next vblank, and inform all the fences of the deadline.
+ */
+static void set_fence_deadline(struct drm_device *dev,
+  struct drm_atomic_state *state)
+{
+   struct drm_crtc *crtc;
+   struct drm_crtc_state *new_crtc_state;
+   struct drm_plane *plane;
+   struct drm_plane_state *new_plane_state;
+   ktime_t vbltime = 0;
+   int i;
+
+   for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) {
+   ktime_t v;
+
+   if (drm_crtc_next_vblank_start(crtc, ))
+   continue;
+
+   if (!vbltime || ktime_before(v, vbltime))
+   vbltime = v;
+   }
+
+   /* If no CRTCs updated, then nothing to do: */
+   if (!vbltime)
+   return;
+
+   for_each_new_plane_in_state (state, plane, new_plane_state, i) {
+   if (!new_plane_state->fence)
+   continue;
+   dma_fence_set_deadline(new_plane_state->fence, vbltime);
+   }
+}
+
 /**
  * drm_atomic_helper_wait_for_fences - wait for fences stashed in plane state
  * @dev: DRM device
@@ -1540,6 +1575,8 @@ int drm_atomic_helper_wait_for_fences(struct drm_device 
*dev,
struct drm_plane_state *new_plane_state;
int i, ret;
 
+   set_fence_deadline(dev, state);
+
for_each_new_plane_in_state(state, plane, new_plane_state, i) {
if (!new_plane_state->fence)
continue;
-- 
2.39.2

[Intel-gfx] [PATCH v10 06/15] dma-buf/sync_file: Add SET_DEADLINE ioctl

2023-03-08 Thread Rob Clark

From: Rob Clark 

The initial purpose is for igt tests, but this would also be useful for
compositors that wait until close to vblank deadline to make decisions
about which frame to show.

The igt tests can be found at:

https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline

v2: Clarify the timebase, add link to igt tests
v3: Use u64 value in ns to express deadline.
v4: More doc

Signed-off-by: Rob Clark 
Acked-by: Pekka Paalanen 
---
 drivers/dma-buf/dma-fence.c|  3 ++-
 drivers/dma-buf/sync_file.c| 19 +++
 include/uapi/linux/sync_file.h | 22 ++
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index f177c56269bb..74e36f6d05b0 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -933,7 +933,8 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
  *   the GPU's devfreq to reduce frequency, when in fact the opposite is what 
is
  *   needed.
  *
- * To this end, deadline hint(s) can be set on a _fence via 
_fence_set_deadline.
+ * To this end, deadline hint(s) can be set on a _fence via 
_fence_set_deadline
+ * (or indirectly via userspace facing ioctls like _set_deadline).
  * The deadline hint provides a way for the waiting driver, or userspace, to
  * convey an appropriate sense of urgency to the signaling driver.
  *
diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index af57799c86ce..418021cfb87c 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -350,6 +350,22 @@ static long sync_file_ioctl_fence_info(struct sync_file 
*sync_file,
return ret;
 }
 
+static int sync_file_ioctl_set_deadline(struct sync_file *sync_file,
+   unsigned long arg)
+{
+   struct sync_set_deadline ts;
+
+   if (copy_from_user(, (void __user *)arg, sizeof(ts)))
+   return -EFAULT;
+
+   if (ts.pad)
+   return -EINVAL;
+
+   dma_fence_set_deadline(sync_file->fence, ns_to_ktime(ts.deadline_ns));
+
+   return 0;
+}
+
 static long sync_file_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
 {
@@ -362,6 +378,9 @@ static long sync_file_ioctl(struct file *file, unsigned int 
cmd,
case SYNC_IOC_FILE_INFO:
return sync_file_ioctl_fence_info(sync_file, arg);
 
+   case SYNC_IOC_SET_DEADLINE:
+   return sync_file_ioctl_set_deadline(sync_file, arg);
+
default:
return -ENOTTY;
}
diff --git a/include/uapi/linux/sync_file.h b/include/uapi/linux/sync_file.h
index 7e42a5b7558b..d61752dca4c6 100644
--- a/include/uapi/linux/sync_file.h
+++ b/include/uapi/linux/sync_file.h
@@ -76,6 +76,27 @@ struct sync_file_info {
__u64   sync_fence_info;
 };
 
+/**
+ * struct sync_set_deadline - SYNC_IOC_SET_DEADLINE - set a deadline hint on a 
fence
+ * @deadline_ns: absolute time of the deadline
+ * @pad:   must be zero
+ *
+ * Allows userspace to set a deadline on a fence, see _fence_set_deadline
+ *
+ * The timebase for the deadline is CLOCK_MONOTONIC (same as vblank).  For
+ * example
+ *
+ * clock_gettime(CLOCK_MONOTONIC, );
+ * deadline_ns = (t.tv_sec * 10L) + t.tv_nsec + ns_until_deadline
+ */
+struct sync_set_deadline {
+   __u64   deadline_ns;
+   /* Not strictly needed for alignment but gives some possibility
+* for future extension:
+*/
+   __u64   pad;
+};
+
 #define SYNC_IOC_MAGIC '>'
 
 /*
@@ -87,5 +108,6 @@ struct sync_file_info {
 
 #define SYNC_IOC_MERGE _IOWR(SYNC_IOC_MAGIC, 3, struct sync_merge_data)
 #define SYNC_IOC_FILE_INFO _IOWR(SYNC_IOC_MAGIC, 4, struct sync_file_info)
+#define SYNC_IOC_SET_DEADLINE  _IOW(SYNC_IOC_MAGIC, 5, struct 
sync_set_deadline)
 
 #endif /* _UAPI_LINUX_SYNC_H */
-- 
2.39.2

[Intel-gfx] [PATCH v10 07/15] dma-buf/sw_sync: Add fence deadline support

2023-03-08 Thread Rob Clark

From: Rob Clark 

This consists of simply storing the most recent deadline, and adding an
ioctl to retrieve the deadline.  This can be used in conjunction with
the SET_DEADLINE ioctl on a fence fd for testing.  Ie. create various
sw_sync fences, merge them into a fence-array, set deadline on the
fence-array and confirm that it is propagated properly to each fence.

v2: Switch UABI to express deadline as u64
v3: More verbose UAPI docs, show how to convert from timespec
v4: Better comments, track the soonest deadline, as a normal fence
implementation would, return an error if no deadline set.

Signed-off-by: Rob Clark 
Reviewed-by: Christian König 
Acked-by: Pekka Paalanen 
---
 drivers/dma-buf/sw_sync.c| 81 
 drivers/dma-buf/sync_debug.h |  2 +
 2 files changed, 83 insertions(+)

diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 348b3a9170fa..f53071bca3af 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -52,12 +52,33 @@ struct sw_sync_create_fence_data {
__s32   fence; /* fd of new fence */
 };
 
+/**
+ * struct sw_sync_get_deadline - get the deadline hint of a sw_sync fence
+ * @deadline_ns: absolute time of the deadline
+ * @pad:   must be zero
+ * @fence_fd:  the sw_sync fence fd (in)
+ *
+ * Return the earliest deadline set on the fence.  The timebase for the
+ * deadline is CLOCK_MONOTONIC (same as vblank).  If there is no deadline
+ * set on the fence, this ioctl will return -ENOENT.
+ */
+struct sw_sync_get_deadline {
+   __u64   deadline_ns;
+   __u32   pad;
+   __s32   fence_fd;
+};
+
 #define SW_SYNC_IOC_MAGIC  'W'
 
 #define SW_SYNC_IOC_CREATE_FENCE   _IOWR(SW_SYNC_IOC_MAGIC, 0,\
struct sw_sync_create_fence_data)
 
 #define SW_SYNC_IOC_INC_IOW(SW_SYNC_IOC_MAGIC, 1, 
__u32)
+#define SW_SYNC_GET_DEADLINE   _IOWR(SW_SYNC_IOC_MAGIC, 2, \
+   struct sw_sync_get_deadline)
+
+
+#define SW_SYNC_HAS_DEADLINE_BIT   DMA_FENCE_FLAG_USER_BITS
 
 static const struct dma_fence_ops timeline_fence_ops;
 
@@ -171,6 +192,22 @@ static void timeline_fence_timeline_value_str(struct 
dma_fence *fence,
snprintf(str, size, "%d", parent->value);
 }
 
+static void timeline_fence_set_deadline(struct dma_fence *fence, ktime_t 
deadline)
+{
+   struct sync_pt *pt = dma_fence_to_sync_pt(fence);
+   unsigned long flags;
+
+   spin_lock_irqsave(fence->lock, flags);
+   if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, >flags)) {
+   if (ktime_before(deadline, pt->deadline))
+   pt->deadline = deadline;
+   } else {
+   pt->deadline = deadline;
+   set_bit(SW_SYNC_HAS_DEADLINE_BIT, >flags);
+   }
+   spin_unlock_irqrestore(fence->lock, flags);
+}
+
 static const struct dma_fence_ops timeline_fence_ops = {
.get_driver_name = timeline_fence_get_driver_name,
.get_timeline_name = timeline_fence_get_timeline_name,
@@ -179,6 +216,7 @@ static const struct dma_fence_ops timeline_fence_ops = {
.release = timeline_fence_release,
.fence_value_str = timeline_fence_value_str,
.timeline_value_str = timeline_fence_timeline_value_str,
+   .set_deadline = timeline_fence_set_deadline,
 };
 
 /**
@@ -387,6 +425,46 @@ static long sw_sync_ioctl_inc(struct sync_timeline *obj, 
unsigned long arg)
return 0;
 }
 
+static int sw_sync_ioctl_get_deadline(struct sync_timeline *obj, unsigned long 
arg)
+{
+   struct sw_sync_get_deadline data;
+   struct dma_fence *fence;
+   struct sync_pt *pt;
+   int ret = 0;
+
+   if (copy_from_user(, (void __user *)arg, sizeof(data)))
+   return -EFAULT;
+
+   if (data.deadline_ns || data.pad)
+   return -EINVAL;
+
+   fence = sync_file_get_fence(data.fence_fd);
+   if (!fence)
+   return -EINVAL;
+
+   pt = dma_fence_to_sync_pt(fence);
+   if (!pt)
+   return -EINVAL;
+
+   spin_lock(fence->lock);
+   if (test_bit(SW_SYNC_HAS_DEADLINE_BIT, >flags)) {
+   data.deadline_ns = ktime_to_ns(pt->deadline);
+   } else {
+   ret = -ENOENT;
+   }
+   spin_unlock(fence->lock);
+
+   dma_fence_put(fence);
+
+   if (ret)
+   return ret;
+
+   if (copy_to_user((void __user *)arg, , sizeof(data)))
+   return -EFAULT;
+
+   return 0;
+}
+
 static long sw_sync_ioctl(struct file *file, unsigned int cmd,
  unsigned long arg)
 {
@@ -399,6 +477,9 @@ static long sw_sync_ioctl(struct file *file, unsigned int 
cmd,
case SW_SYNC_IOC_INC:
return sw_sync_ioctl_inc(obj, arg);
 
+   case SW_SYNC_GET_DEADLINE:
+   return sw_sync_ioctl_get_deadline(obj, arg);
+
default:
return -ENOTTY;
}
diff --git a/drivers/dma-buf/syn

[Intel-gfx] [PATCH v10 08/15] drm/scheduler: Add fence deadline support

2023-03-08 Thread Rob Clark

As the finished fence is the one that is exposed to userspace, and
therefore the one that other operations, like atomic update, would
block on, we need to propagate the deadline from from the finished
fence to the actual hw fence.

v2: Split into drm_sched_fence_set_parent() (ckoenig)
v3: Ensure a thread calling drm_sched_fence_set_deadline_finished() sees
fence->parent set before drm_sched_fence_set_parent() does this
test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT).

Signed-off-by: Rob Clark 
Acked-by: Luben Tuikov 
---
 drivers/gpu/drm/scheduler/sched_fence.c | 46 +
 drivers/gpu/drm/scheduler/sched_main.c  |  2 +-
 include/drm/gpu_scheduler.h | 17 +
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c 
b/drivers/gpu/drm/scheduler/sched_fence.c
index 7fd869520ef2..fe9c6468e440 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -123,6 +123,37 @@ static void drm_sched_fence_release_finished(struct 
dma_fence *f)
dma_fence_put(>scheduled);
 }
 
+static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
+ ktime_t deadline)
+{
+   struct drm_sched_fence *fence = to_drm_sched_fence(f);
+   struct dma_fence *parent;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+
+   /* If we already have an earlier deadline, keep it: */
+   if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, >flags) &&
+   ktime_before(fence->deadline, deadline)) {
+   spin_unlock_irqrestore(>lock, flags);
+   return;
+   }
+
+   fence->deadline = deadline;
+   set_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT, >flags);
+
+   spin_unlock_irqrestore(>lock, flags);
+
+   /*
+* smp_load_aquire() to ensure that if we are racing another
+* thread calling drm_sched_fence_set_parent(), that we see
+* the parent set before it calls test_bit(HAS_DEADLINE_BIT)
+*/
+   parent = smp_load_acquire(>parent);
+   if (parent)
+   dma_fence_set_deadline(parent, deadline);
+}
+
 static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
.get_driver_name = drm_sched_fence_get_driver_name,
.get_timeline_name = drm_sched_fence_get_timeline_name,
@@ -133,6 +164,7 @@ static const struct dma_fence_ops 
drm_sched_fence_ops_finished = {
.get_driver_name = drm_sched_fence_get_driver_name,
.get_timeline_name = drm_sched_fence_get_timeline_name,
.release = drm_sched_fence_release_finished,
+   .set_deadline = drm_sched_fence_set_deadline_finished,
 };
 
 struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
@@ -147,6 +179,20 @@ struct drm_sched_fence *to_drm_sched_fence(struct 
dma_fence *f)
 }
 EXPORT_SYMBOL(to_drm_sched_fence);
 
+void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence,
+   struct dma_fence *fence)
+{
+   /*
+* smp_store_release() to ensure another thread racing us
+* in drm_sched_fence_set_deadline_finished() sees the
+* fence's parent set before test_bit()
+*/
+   smp_store_release(_fence->parent, dma_fence_get(fence));
+   if (test_bit(DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT,
+_fence->finished.flags))
+   dma_fence_set_deadline(fence, s_fence->deadline);
+}
+
 struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity *entity,
  void *owner)
 {
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 4e6ad6e122bc..007f98c48f8d 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1019,7 +1019,7 @@ static int drm_sched_main(void *param)
drm_sched_fence_scheduled(s_fence);
 
if (!IS_ERR_OR_NULL(fence)) {
-   s_fence->parent = dma_fence_get(fence);
+   drm_sched_fence_set_parent(s_fence, fence);
/* Drop for original kref_init of the fence */
dma_fence_put(fence);
 
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 9db9e5e504ee..99584e457153 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -41,6 +41,15 @@
  */
 #define DRM_SCHED_FENCE_DONT_PIPELINE  DMA_FENCE_FLAG_USER_BITS
 
+/**
+ * DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT - A fence deadline hint has been set
+ *
+ * Because we could have a deadline hint can be set before the backing hw
+ * fence is created, we need to keep track of whether a deadline has already
+ * been set.
+ */
+#define DRM_SCHED_FENCE_FLAG_HAS_DEADLINE_BIT  (DMA_FENCE_FLAG_USER_BITS + 1)
+
 enum dma_resv_usage;
 struct dma_resv;
 struct drm_gem_o

[Intel-gfx] [PATCH v10 03/15] dma-buf/fence-chain: Add fence deadline support

2023-03-08 Thread Rob Clark

From: Rob Clark 

Propagate the deadline to all the fences in the chain.

v2: Use dma_fence_chain_contained [Tvrtko]

Signed-off-by: Rob Clark 
Reviewed-by: Christian König  for this one.
---
 drivers/dma-buf/dma-fence-chain.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
index a0d920576ba6..9663ba1bb6ac 100644
--- a/drivers/dma-buf/dma-fence-chain.c
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -206,6 +206,17 @@ static void dma_fence_chain_release(struct dma_fence 
*fence)
dma_fence_free(fence);
 }
 
+
+static void dma_fence_chain_set_deadline(struct dma_fence *fence,
+ktime_t deadline)
+{
+   dma_fence_chain_for_each(fence, fence) {
+   struct dma_fence *f = dma_fence_chain_contained(fence);
+
+   dma_fence_set_deadline(f, deadline);
+   }
+}
+
 const struct dma_fence_ops dma_fence_chain_ops = {
.use_64bit_seqno = true,
.get_driver_name = dma_fence_chain_get_driver_name,
@@ -213,6 +224,7 @@ const struct dma_fence_ops dma_fence_chain_ops = {
.enable_signaling = dma_fence_chain_enable_signaling,
.signaled = dma_fence_chain_signaled,
.release = dma_fence_chain_release,
+   .set_deadline = dma_fence_chain_set_deadline,
 };
 EXPORT_SYMBOL(dma_fence_chain_ops);
 
-- 
2.39.2

[Intel-gfx] [PATCH v10 05/15] dma-buf/sync_file: Surface sync-file uABI

2023-03-08 Thread Rob Clark

From: Rob Clark 

We had all of the internal driver APIs, but not the all important
userspace uABI, in the dma-buf doc.  Fix that.  And re-arrange the
comments slightly as otherwise the comments for the ioctl nr defines
would not show up.

v2: Fix docs build warning coming from newly including the uabi header
in the docs build

Signed-off-by: Rob Clark 
Acked-by: Pekka Paalanen 
---
 Documentation/driver-api/dma-buf.rst | 10 ++--
 include/uapi/linux/sync_file.h   | 37 +++-
 2 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/Documentation/driver-api/dma-buf.rst 
b/Documentation/driver-api/dma-buf.rst
index 183e480d8cea..ff3f8da296af 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -203,8 +203,8 @@ DMA Fence unwrap
 .. kernel-doc:: include/linux/dma-fence-unwrap.h
:internal:
 
-DMA Fence uABI/Sync File
-
+DMA Fence Sync File
+~~~
 
 .. kernel-doc:: drivers/dma-buf/sync_file.c
:export:
@@ -212,6 +212,12 @@ DMA Fence uABI/Sync File
 .. kernel-doc:: include/linux/sync_file.h
:internal:
 
+DMA Fence Sync File uABI
+
+
+.. kernel-doc:: include/uapi/linux/sync_file.h
+   :internal:
+
 Indefinite DMA Fences
 ~
 
diff --git a/include/uapi/linux/sync_file.h b/include/uapi/linux/sync_file.h
index ee2dcfb3d660..7e42a5b7558b 100644
--- a/include/uapi/linux/sync_file.h
+++ b/include/uapi/linux/sync_file.h
@@ -16,12 +16,16 @@
 #include 
 
 /**
- * struct sync_merge_data - data passed to merge ioctl
+ * struct sync_merge_data - SYNC_IOC_MERGE: merge two fences
  * @name:  name of new fence
  * @fd2:   file descriptor of second fence
  * @fence: returns the fd of the new fence to userspace
  * @flags: merge_data flags
  * @pad:   padding for 64-bit alignment, should always be zero
+ *
+ * Creates a new fence containing copies of the sync_pts in both
+ * the calling fd and sync_merge_data.fd2.  Returns the new fence's
+ * fd in sync_merge_data.fence
  */
 struct sync_merge_data {
charname[32];
@@ -34,8 +38,8 @@ struct sync_merge_data {
 /**
  * struct sync_fence_info - detailed fence information
  * @obj_name:  name of parent sync_timeline
-* @driver_name:name of driver implementing the parent
-* @status: status of the fence 0:active 1:signaled <0:error
+ * @driver_name:   name of driver implementing the parent
+ * @status:status of the fence 0:active 1:signaled <0:error
  * @flags: fence_info flags
  * @timestamp_ns:  timestamp of status change in nanoseconds
  */
@@ -48,14 +52,19 @@ struct sync_fence_info {
 };
 
 /**
- * struct sync_file_info - data returned from fence info ioctl
+ * struct sync_file_info - SYNC_IOC_FILE_INFO: get detailed information on a 
sync_file
  * @name:  name of fence
  * @status:status of fence. 1: signaled 0:active <0:error
  * @flags: sync_file_info flags
  * @num_fences number of fences in the sync_file
  * @pad:   padding for 64-bit alignment, should always be zero
- * @sync_fence_info: pointer to array of structs sync_fence_info with all
+ * @sync_fence_info: pointer to array of struct _fence_info with all
  *  fences in the sync_file
+ *
+ * Takes a struct sync_file_info. If num_fences is 0, the field is updated
+ * with the actual number of fences. If num_fences is > 0, the system will
+ * use the pointer provided on sync_fence_info to return up to num_fences of
+ * struct sync_fence_info, with detailed fence information.
  */
 struct sync_file_info {
charname[32];
@@ -69,30 +78,14 @@ struct sync_file_info {
 
 #define SYNC_IOC_MAGIC '>'
 
-/**
+/*
  * Opcodes  0, 1 and 2 were burned during a API change to avoid users of the
  * old API to get weird errors when trying to handling sync_files. The API
  * change happened during the de-stage of the Sync Framework when there was
  * no upstream users available.
  */
 
-/**
- * DOC: SYNC_IOC_MERGE - merge two fences
- *
- * Takes a struct sync_merge_data.  Creates a new fence containing copies of
- * the sync_pts in both the calling fd and sync_merge_data.fd2.  Returns the
- * new fence's fd in sync_merge_data.fence
- */
 #define SYNC_IOC_MERGE _IOWR(SYNC_IOC_MAGIC, 3, struct sync_merge_data)
-
-/**
- * DOC: SYNC_IOC_FILE_INFO - get detailed information on a sync_file
- *
- * Takes a struct sync_file_info. If num_fences is 0, the field is updated
- * with the actual number of fences. If num_fences is > 0, the system will
- * use the pointer provided on sync_fence_info to return up to num_fences of
- * struct sync_fence_info, with detailed fence information.
- */
 #define SYNC_IOC_FILE_INFO _IOWR(SYNC_IOC_MAGIC, 4, struct sync_file_info)
 
 #endif /* _UAPI_LINUX_SYNC_H */
-- 
2.39.2

[Intel-gfx] [PATCH v10 04/15] dma-buf/dma-resv: Add a way to set fence deadline

2023-03-08 Thread Rob Clark

From: Rob Clark 

Add a way to set a deadline on remaining resv fences according to the
requested usage.

Signed-off-by: Rob Clark 
Reviewed-by: Christian König 
---
 drivers/dma-buf/dma-resv.c | 22 ++
 include/linux/dma-resv.h   |  2 ++
 2 files changed, 24 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 1c76aed8e262..2a594b754af1 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -684,6 +684,28 @@ long dma_resv_wait_timeout(struct dma_resv *obj, enum 
dma_resv_usage usage,
 }
 EXPORT_SYMBOL_GPL(dma_resv_wait_timeout);
 
+/**
+ * dma_resv_set_deadline - Set a deadline on reservation's objects fences
+ * @obj: the reservation object
+ * @usage: controls which fences to include, see enum dma_resv_usage.
+ * @deadline: the requested deadline (MONOTONIC)
+ *
+ * May be called without holding the dma_resv lock.  Sets @deadline on
+ * all fences filtered by @usage.
+ */
+void dma_resv_set_deadline(struct dma_resv *obj, enum dma_resv_usage usage,
+  ktime_t deadline)
+{
+   struct dma_resv_iter cursor;
+   struct dma_fence *fence;
+
+   dma_resv_iter_begin(, obj, usage);
+   dma_resv_for_each_fence_unlocked(, fence) {
+   dma_fence_set_deadline(fence, deadline);
+   }
+   dma_resv_iter_end();
+}
+EXPORT_SYMBOL_GPL(dma_resv_set_deadline);
 
 /**
  * dma_resv_test_signaled - Test if a reservation object's fences have been
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index 0637659a702c..8d0e34dad446 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -479,6 +479,8 @@ int dma_resv_get_singleton(struct dma_resv *obj, enum 
dma_resv_usage usage,
 int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src);
 long dma_resv_wait_timeout(struct dma_resv *obj, enum dma_resv_usage usage,
   bool intr, unsigned long timeout);
+void dma_resv_set_deadline(struct dma_resv *obj, enum dma_resv_usage usage,
+  ktime_t deadline);
 bool dma_resv_test_signaled(struct dma_resv *obj, enum dma_resv_usage usage);
 void dma_resv_describe(struct dma_resv *obj, struct seq_file *seq);
 
-- 
2.39.2

[Intel-gfx] [PATCH v10 02/15] dma-buf/fence-array: Add fence deadline support

2023-03-08 Thread Rob Clark

From: Rob Clark 

Propagate the deadline to all the fences in the array.

Signed-off-by: Rob Clark 
Reviewed-by: Christian König 
---
 drivers/dma-buf/dma-fence-array.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/dma-buf/dma-fence-array.c 
b/drivers/dma-buf/dma-fence-array.c
index 5c8a7084577b..9b3ce8948351 100644
--- a/drivers/dma-buf/dma-fence-array.c
+++ b/drivers/dma-buf/dma-fence-array.c
@@ -123,12 +123,23 @@ static void dma_fence_array_release(struct dma_fence 
*fence)
dma_fence_free(fence);
 }
 
+static void dma_fence_array_set_deadline(struct dma_fence *fence,
+ktime_t deadline)
+{
+   struct dma_fence_array *array = to_dma_fence_array(fence);
+   unsigned i;
+
+   for (i = 0; i < array->num_fences; ++i)
+   dma_fence_set_deadline(array->fences[i], deadline);
+}
+
 const struct dma_fence_ops dma_fence_array_ops = {
.get_driver_name = dma_fence_array_get_driver_name,
.get_timeline_name = dma_fence_array_get_timeline_name,
.enable_signaling = dma_fence_array_enable_signaling,
.signaled = dma_fence_array_signaled,
.release = dma_fence_array_release,
+   .set_deadline = dma_fence_array_set_deadline,
 };
 EXPORT_SYMBOL(dma_fence_array_ops);
 
-- 
2.39.2

[Intel-gfx] [PATCH v10 01/15] dma-buf/dma-fence: Add deadline awareness

2023-03-08 Thread Rob Clark

From: Rob Clark 

Add a way to hint to the fence signaler of an upcoming deadline, such as
vblank, which the fence waiter would prefer not to miss.  This is to aid
the fence signaler in making power management decisions, like boosting
frequency as the deadline approaches and awareness of missing deadlines
so that can be factored in to the frequency scaling.

v2: Drop dma_fence::deadline and related logic to filter duplicate
deadlines, to avoid increasing dma_fence size.  The fence-context
implementation will need similar logic to track deadlines of all
the fences on the same timeline.  [ckoenig]
v3: Clarify locking wrt. set_deadline callback
v4: Clarify in docs comment that this is a hint
v5: Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
v6: More docs
v7: Fix typo, clarify past deadlines

Signed-off-by: Rob Clark 
Reviewed-by: Christian König 
Acked-by: Pekka Paalanen 
Reviewed-by: Bagas Sanjaya 
---
 Documentation/driver-api/dma-buf.rst |  6 +++
 drivers/dma-buf/dma-fence.c  | 59 
 include/linux/dma-fence.h| 22 +++
 3 files changed, 87 insertions(+)

diff --git a/Documentation/driver-api/dma-buf.rst 
b/Documentation/driver-api/dma-buf.rst
index 622b8156d212..183e480d8cea 100644
--- a/Documentation/driver-api/dma-buf.rst
+++ b/Documentation/driver-api/dma-buf.rst
@@ -164,6 +164,12 @@ DMA Fence Signalling Annotations
 .. kernel-doc:: drivers/dma-buf/dma-fence.c
:doc: fence signalling annotation
 
+DMA Fence Deadline Hints
+
+
+.. kernel-doc:: drivers/dma-buf/dma-fence.c
+   :doc: deadline hints
+
 DMA Fences Functions Reference
 ~~
 
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 0de0482cd36e..f177c56269bb 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -912,6 +912,65 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, 
uint32_t count,
 }
 EXPORT_SYMBOL(dma_fence_wait_any_timeout);
 
+/**
+ * DOC: deadline hints
+ *
+ * In an ideal world, it would be possible to pipeline a workload sufficiently
+ * that a utilization based device frequency governor could arrive at a minimum
+ * frequency that meets the requirements of the use-case, in order to minimize
+ * power consumption.  But in the real world there are many workloads which
+ * defy this ideal.  For example, but not limited to:
+ *
+ * * Workloads that ping-pong between device and CPU, with alternating periods
+ *   of CPU waiting for device, and device waiting on CPU.  This can result in
+ *   devfreq and cpufreq seeing idle time in their respective domains and in
+ *   result reduce frequency.
+ *
+ * * Workloads that interact with a periodic time based deadline, such as 
double
+ *   buffered GPU rendering vs vblank sync'd page flipping.  In this scenario,
+ *   missing a vblank deadline results in an *increase* in idle time on the GPU
+ *   (since it has to wait an additional vblank period), sending a signal to
+ *   the GPU's devfreq to reduce frequency, when in fact the opposite is what 
is
+ *   needed.
+ *
+ * To this end, deadline hint(s) can be set on a _fence via 
_fence_set_deadline.
+ * The deadline hint provides a way for the waiting driver, or userspace, to
+ * convey an appropriate sense of urgency to the signaling driver.
+ *
+ * A deadline hint is given in absolute ktime (CLOCK_MONOTONIC for userspace
+ * facing APIs).  The time could either be some point in the future (such as
+ * the vblank based deadline for page-flipping, or the start of a compositor's
+ * composition cycle), or the current time to indicate an immediate deadline
+ * hint (Ie. forward progress cannot be made until this fence is signaled).
+ *
+ * Multiple deadlines may be set on a given fence, even in parallel.  See the
+ * documentation for _fence_ops.set_deadline.
+ *
+ * The deadline hint is just that, a hint.  The driver that created the fence
+ * may react by increasing frequency, making different scheduling choices, etc.
+ * Or doing nothing at all.
+ */
+
+/**
+ * dma_fence_set_deadline - set desired fence-wait deadline hint
+ * @fence:the fence that is to be waited on
+ * @deadline: the time by which the waiter hopes for the fence to be
+ *signaled
+ *
+ * Give the fence signaler a hint about an upcoming deadline, such as
+ * vblank, by which point the waiter would prefer the fence to be
+ * signaled by.  This is intended to give feedback to the fence signaler
+ * to aid in power management decisions, such as boosting GPU frequency
+ * if a periodic vblank deadline is approaching but the fence is not
+ * yet signaled..
+ */
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+   if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
+   fence->ops->set_deadline(fence, deadline);
+}
+EXPORT_SYMBOL(dma_fence_set_deadline);
+
 /**
  * dma_fence_describe - Dump fence describtion into seq_file

[Intel-gfx] [PATCH v10 00/15] dma-fence: Deadline awareness

2023-03-08 Thread Rob Clark

From: Rob Clark 

This series adds a deadline hint to fences, so realtime deadlines
such as vblank can be communicated to the fence signaller for power/
frequency management decisions.

This is partially inspired by a trick i915 does, but implemented
via dma-fence for a couple of reasons:

1) To continue to be able to use the atomic helpers
2) To support cases where display and gpu are different drivers

This iteration adds a dma-fence ioctl to set a deadline (both to
support igt-tests, and compositors which delay decisions about which
client buffer to display), and a sw_sync ioctl to read back the
deadline.  IGT tests utilizing these can be found at:

  https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline


v1: https://patchwork.freedesktop.org/series/93035/
v2: Move filtering out of later deadlines to fence implementation
to avoid increasing the size of dma_fence
v3: Add support in fence-array and fence-chain; Add some uabi to
support igt tests and userspace compositors.
v4: Rebase, address various comments, and add syncobj deadline
support, and sync_file EPOLLPRI based on experience with perf/
freq issues with clvk compute workloads on i915 (anv)
v5: Clarify that this is a hint as opposed to a more hard deadline
guarantee, switch to using u64 ns values in UABI (still absolute
CLOCK_MONOTONIC values), drop syncobj related cap and driver
feature flag in favor of allowing count_handles==0 for probing
kernel support.
v6: Re-work vblank helper to calculate time of _start_ of vblank,
and work correctly if the last vblank event was more than a
frame ago.  Add (mostly unrelated) drm/msm patch which also
uses the vblank helper.  Use dma_fence_chain_contained().  More
verbose syncobj UABI comments.  Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
v7: Fix kbuild complaints about vblank helper.  Add more docs.
v8: Add patch to surface sync_file UAPI, and more docs updates.
v9: Drop (E)POLLPRI support.. I still like it, but not essential and
it can always be revived later.  Fix doc build warning.
v10: Update 11/15 to handle multiple CRTCs

Rob Clark (15):
  dma-buf/dma-fence: Add deadline awareness
  dma-buf/fence-array: Add fence deadline support
  dma-buf/fence-chain: Add fence deadline support
  dma-buf/dma-resv: Add a way to set fence deadline
  dma-buf/sync_file: Surface sync-file uABI
  dma-buf/sync_file: Add SET_DEADLINE ioctl
  dma-buf/sw_sync: Add fence deadline support
  drm/scheduler: Add fence deadline support
  drm/syncobj: Add deadline support for syncobj waits
  drm/vblank: Add helper to get next vblank time
  drm/atomic-helper: Set fence deadline for vblank
  drm/msm: Add deadline based boost support
  drm/msm: Add wait-boost support
  drm/msm/atomic: Switch to vblank_start helper
  drm/i915: Add deadline based boost support

Rob Clark (15):
  dma-buf/dma-fence: Add deadline awareness
  dma-buf/fence-array: Add fence deadline support
  dma-buf/fence-chain: Add fence deadline support
  dma-buf/dma-resv: Add a way to set fence deadline
  dma-buf/sync_file: Surface sync-file uABI
  dma-buf/sync_file: Add SET_DEADLINE ioctl
  dma-buf/sw_sync: Add fence deadline support
  drm/scheduler: Add fence deadline support
  drm/syncobj: Add deadline support for syncobj waits
  drm/vblank: Add helper to get next vblank time
  drm/atomic-helper: Set fence deadline for vblank
  drm/msm: Add deadline based boost support
  drm/msm: Add wait-boost support
  drm/msm/atomic: Switch to vblank_start helper
  drm/i915: Add deadline based boost support

 Documentation/driver-api/dma-buf.rst| 16 -
 drivers/dma-buf/dma-fence-array.c   | 11 
 drivers/dma-buf/dma-fence-chain.c   | 12 
 drivers/dma-buf/dma-fence.c | 60 ++
 drivers/dma-buf/dma-resv.c  | 22 +++
 drivers/dma-buf/sw_sync.c   | 81 +
 drivers/dma-buf/sync_debug.h|  2 +
 drivers/dma-buf/sync_file.c | 19 ++
 drivers/gpu/drm/drm_atomic_helper.c | 37 +++
 drivers/gpu/drm/drm_syncobj.c   | 64 +++
 drivers/gpu/drm/drm_vblank.c| 53 +---
 drivers/gpu/drm/i915/i915_request.c | 20 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 -
 drivers/gpu/drm/msm/msm_atomic.c|  8 ++-
 drivers/gpu/drm/msm/msm_drv.c   | 12 ++--
 drivers/gpu/drm/msm/msm_fence.c | 74 ++
 drivers/gpu/drm/msm/msm_fence.h | 20 ++
 drivers/gpu/drm/msm/msm_gem.c   |  5 ++
 drivers/gpu/drm/msm/msm_kms.h   |  8 ---
 drivers/gpu/drm/scheduler/sched_fence.c | 46 ++
 drivers/gpu/drm/scheduler/sched_main.c  |  2 +-
 include/drm/drm_vblank.h|  1 +
 include/drm/gpu_scheduler.h | 17 ++
 include/linux/dma-fence.h   | 22 +++
 include/linux/dma-resv.h|  2 +
 include/uapi/drm/drm.h  | 17

Re: [Intel-gfx] [PATCH v9 15/15] drm/i915: Add deadline based boost support

2023-03-03 Thread Rob Clark

On Fri, Mar 3, 2023 at 7:20 AM Ville Syrjälä
 wrote:
>
> On Fri, Mar 03, 2023 at 05:00:03PM +0200, Ville Syrjälä wrote:
> > On Fri, Mar 03, 2023 at 06:48:43AM -0800, Rob Clark wrote:
> > > On Fri, Mar 3, 2023 at 1:58 AM Tvrtko Ursulin
> > >  wrote:
> > > >
> > > >
> > > > On 03/03/2023 03:21, Rodrigo Vivi wrote:
> > > > > On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
> > > > >> From: Rob Clark 
> > > > >>
> > > > >
> > > > > missing some wording here...
> > > > >
> > > > >> v2: rebase
> > > > >>
> > > > >> Signed-off-by: Rob Clark 
> > > > >> ---
> > > > >>   drivers/gpu/drm/i915/i915_request.c | 20 
> > > > >>   1 file changed, 20 insertions(+)
> > > > >>
> > > > >> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> > > > >> b/drivers/gpu/drm/i915/i915_request.c
> > > > >> index 7503dcb9043b..44491e7e214c 100644
> > > > >> --- a/drivers/gpu/drm/i915/i915_request.c
> > > > >> +++ b/drivers/gpu/drm/i915/i915_request.c
> > > > >> @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct 
> > > > >> dma_fence *fence)
> > > > >>  return i915_request_enable_breadcrumb(to_request(fence));
> > > > >>   }
> > > > >>
> > > > >> +static void i915_fence_set_deadline(struct dma_fence *fence, 
> > > > >> ktime_t deadline)
> > > > >> +{
> > > > >> +struct i915_request *rq = to_request(fence);
> > > > >> +
> > > > >> +if (i915_request_completed(rq))
> > > > >> +return;
> > > > >> +
> > > > >> +if (i915_request_started(rq))
> > > > >> +return;
> > > > >
> > > > > why do we skip the boost if already started?
> > > > > don't we want to boost the freq anyway?
> > > >
> > > > I'd wager Rob is just copying the current i915 wait boost logic.
> > >
> > > Yup, and probably incorrectly.. Matt reported fewer boosts/sec
> > > compared to your RFC, this could be the bug
> >
> > I don't think i915 calls drm_atomic_helper_wait_for_fences()
> > so that could explain something.
>
> Oh, I guess this wasn't even supposed to take over the current
> display boost stuff since you didn't remove the old one.

Right, I didn't try to replace the current thing.. but hopefully at
least make it possible for i915 to use more of the atomic helpers in
the future

BR,
-R

> The current one just boosts after a missed vblank. The deadline
> could use your timer approach I suppose and boost already a bit
> earlier in the hopes of not missing the vblank.
>
> --
> Ville Syrjälä
> Intel

Re: [Intel-gfx] [Freedreno] [PATCH v9 15/15] drm/i915: Add deadline based boost support

2023-03-03 Thread Rob Clark

On Fri, Mar 3, 2023 at 7:08 AM Tvrtko Ursulin
 wrote:
>
>
> On 03/03/2023 14:48, Rob Clark wrote:
> > On Fri, Mar 3, 2023 at 1:58 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 03/03/2023 03:21, Rodrigo Vivi wrote:
> >>> On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
> >>>> From: Rob Clark 
> >>>>
> >>>
> >>> missing some wording here...
> >>>
> >>>> v2: rebase
> >>>>
> >>>> Signed-off-by: Rob Clark 
> >>>> ---
> >>>>drivers/gpu/drm/i915/i915_request.c | 20 
> >>>>1 file changed, 20 insertions(+)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> >>>> b/drivers/gpu/drm/i915/i915_request.c
> >>>> index 7503dcb9043b..44491e7e214c 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_request.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_request.c
> >>>> @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct 
> >>>> dma_fence *fence)
> >>>>   return i915_request_enable_breadcrumb(to_request(fence));
> >>>>}
> >>>>
> >>>> +static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t 
> >>>> deadline)
> >>>> +{
> >>>> +struct i915_request *rq = to_request(fence);
> >>>> +
> >>>> +if (i915_request_completed(rq))
> >>>> +return;
> >>>> +
> >>>> +if (i915_request_started(rq))
> >>>> +return;
> >>>
> >>> why do we skip the boost if already started?
> >>> don't we want to boost the freq anyway?
> >>
> >> I'd wager Rob is just copying the current i915 wait boost logic.
> >
> > Yup, and probably incorrectly.. Matt reported fewer boosts/sec
> > compared to your RFC, this could be the bug
>
> Hm, there I have preserved this same !i915_request_started logic.
>
> Presumably it's not just fewer boosts but lower performance. How is he
> setting the deadline? Somehow from clFlush or so?

Yeah, fewer boosts, lower freq/perf.. I cobbled together a quick mesa
hack to set the DEADLINE flag on syncobj waits, but it seems likely
that I missed something somewhere

BR,
-R

> Regards,
>
> Tvrtko
>
> P.S. Take note that I did not post the latest version of my RFC. The one
> where I fix the fence chain and array misses you pointed out. I did not
> think it would be worthwhile given no universal love for it, but if
> people are testing with it more widely that I was aware perhaps I should.
>
> >>>> +
> >>>> +/*
> >>>> + * TODO something more clever for deadlines that are in the
> >>>> + * future.  I think probably track the nearest deadline in
> >>>> + * rq->timeline and set timer to trigger boost accordingly?
> >>>> + */
> >>>
> >>> I'm afraid it will be very hard to find some heuristics of what's
> >>> late enough for the boost no?
> >>> I mean, how early to boost the freq on an upcoming deadline for the
> >>> timer?
> >>
> >> We can off load this patch from Rob and deal with it separately, or
> >> after the fact?
> >
> > That is completely my intention, I expect you to replace my i915 patch ;-)
> >
> > Rough idea when everyone is happy with the core bits is to setup an
> > immutable branch without the driver specific patches, which could be
> > merged into drm-next and $driver-next and then each driver team can
> > add there own driver patches on top
> >
> > BR,
> > -R
> >
> >> It's a half solution without a smarter scheduler too. Like
> >> https://lore.kernel.org/all/20210208105236.28498-10-ch...@chris-wilson.co.uk/,
> >> or if GuC plans to do something like that at any point.
> >>
> >> Or bump the priority too if deadline is looming?
> >>
> >> IMO it is not very effective to fiddle with the heuristic on an ad-hoc
> >> basis. For instance I have a new heuristics which improves the
> >> problematic OpenCL cases for further 5% (relative to the current
> >> waitboost improvement from adding missing syncobj waitboost). But I
> >> can't really test properly for regressions over platforms, stacks,
> >> workloads.. :(
> >>
> >> Regards,
> >>
> >> Tvrtko
> >>
> >>>
> >>>> +
> >>>> +intel_rps_boost(rq);
> >>>> +}
> >>>> +
> >>>>static signed long i915_fence_wait(struct dma_fence *fence,
> >>>>  bool interruptible,
> >>>>  signed long timeout)
> >>>> @@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = {
> >>>>   .signaled = i915_fence_signaled,
> >>>>   .wait = i915_fence_wait,
> >>>>   .release = i915_fence_release,
> >>>> +.set_deadline = i915_fence_set_deadline,
> >>>>};
> >>>>
> >>>>static void irq_execute_cb(struct irq_work *wrk)
> >>>> --
> >>>> 2.39.1
> >>>>

Re: [Intel-gfx] [PATCH v9 15/15] drm/i915: Add deadline based boost support

2023-03-03 Thread Rob Clark

On Thu, Mar 2, 2023 at 7:21 PM Rodrigo Vivi  wrote:
>
> On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
> > From: Rob Clark 
> >
>
> missing some wording here...

the wording should be "Pls replace this patch, kthx" ;-)

>
> > v2: rebase
> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/gpu/drm/i915/i915_request.c | 20 
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_request.c 
> > b/drivers/gpu/drm/i915/i915_request.c
> > index 7503dcb9043b..44491e7e214c 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence 
> > *fence)
> >   return i915_request_enable_breadcrumb(to_request(fence));
> >  }
> >
> > +static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t 
> > deadline)
> > +{
> > + struct i915_request *rq = to_request(fence);
> > +
> > + if (i915_request_completed(rq))
> > + return;
> > +
> > + if (i915_request_started(rq))
> > + return;
>
> why do we skip the boost if already started?
> don't we want to boost the freq anyway?
>
> > +
> > + /*
> > +  * TODO something more clever for deadlines that are in the
> > +  * future.  I think probably track the nearest deadline in
> > +  * rq->timeline and set timer to trigger boost accordingly?
> > +  */
>
> I'm afraid it will be very hard to find some heuristics of what's
> late enough for the boost no?
> I mean, how early to boost the freq on an upcoming deadline for the
> timer?

So, from my understanding of i915 boosting, it applies more
specifically to a given request (vs msm which just bumps up the freq
until the next devfreq sampling period which then recalculates target
freq based on busy cycles and avg freq over the last sampling period).
For msm I just set a timer for 3ms before the deadline and boost if
the fence isn't signaled when the timer fires.  It is kinda impossible
to predict, even for userspace, how long a job will take to complete,
so the goal isn't really to finish the specified job by the deadline,
but instead to avoid devfreq landing at a local minimum (maximum?)

AFAIU what I _think_ would make sense for i915 is to do the same thing
you do if you miss a vblank pageflip deadline if the deadline passes
without the fence signaling.

BR,
-R

> > +
> > + intel_rps_boost(rq);
> > +}
> > +
> >  static signed long i915_fence_wait(struct dma_fence *fence,
> >  bool interruptible,
> >  signed long timeout)
> > @@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = {
> >   .signaled = i915_fence_signaled,
> >   .wait = i915_fence_wait,
> >   .release = i915_fence_release,
> > + .set_deadline = i915_fence_set_deadline,
> >  };
> >
> >  static void irq_execute_cb(struct irq_work *wrk)
> > --
> > 2.39.1
> >

Re: [Intel-gfx] [PATCH v9 15/15] drm/i915: Add deadline based boost support

2023-03-03 Thread Rob Clark

On Fri, Mar 3, 2023 at 1:58 AM Tvrtko Ursulin
 wrote:
>
>
> On 03/03/2023 03:21, Rodrigo Vivi wrote:
> > On Thu, Mar 02, 2023 at 03:53:37PM -0800, Rob Clark wrote:
> >> From: Rob Clark 
> >>
> >
> > missing some wording here...
> >
> >> v2: rebase
> >>
> >> Signed-off-by: Rob Clark 
> >> ---
> >>   drivers/gpu/drm/i915/i915_request.c | 20 
> >>   1 file changed, 20 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> >> b/drivers/gpu/drm/i915/i915_request.c
> >> index 7503dcb9043b..44491e7e214c 100644
> >> --- a/drivers/gpu/drm/i915/i915_request.c
> >> +++ b/drivers/gpu/drm/i915/i915_request.c
> >> @@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct 
> >> dma_fence *fence)
> >>  return i915_request_enable_breadcrumb(to_request(fence));
> >>   }
> >>
> >> +static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t 
> >> deadline)
> >> +{
> >> +struct i915_request *rq = to_request(fence);
> >> +
> >> +if (i915_request_completed(rq))
> >> +return;
> >> +
> >> +if (i915_request_started(rq))
> >> +return;
> >
> > why do we skip the boost if already started?
> > don't we want to boost the freq anyway?
>
> I'd wager Rob is just copying the current i915 wait boost logic.

Yup, and probably incorrectly.. Matt reported fewer boosts/sec
compared to your RFC, this could be the bug

> >> +
> >> +/*
> >> + * TODO something more clever for deadlines that are in the
> >> + * future.  I think probably track the nearest deadline in
> >> + * rq->timeline and set timer to trigger boost accordingly?
> >> + */
> >
> > I'm afraid it will be very hard to find some heuristics of what's
> > late enough for the boost no?
> > I mean, how early to boost the freq on an upcoming deadline for the
> > timer?
>
> We can off load this patch from Rob and deal with it separately, or
> after the fact?

That is completely my intention, I expect you to replace my i915 patch ;-)

Rough idea when everyone is happy with the core bits is to setup an
immutable branch without the driver specific patches, which could be
merged into drm-next and $driver-next and then each driver team can
add there own driver patches on top

BR,
-R

> It's a half solution without a smarter scheduler too. Like
> https://lore.kernel.org/all/20210208105236.28498-10-ch...@chris-wilson.co.uk/,
> or if GuC plans to do something like that at any point.
>
> Or bump the priority too if deadline is looming?
>
> IMO it is not very effective to fiddle with the heuristic on an ad-hoc
> basis. For instance I have a new heuristics which improves the
> problematic OpenCL cases for further 5% (relative to the current
> waitboost improvement from adding missing syncobj waitboost). But I
> can't really test properly for regressions over platforms, stacks,
> workloads.. :(
>
> Regards,
>
> Tvrtko
>
> >
> >> +
> >> +intel_rps_boost(rq);
> >> +}
> >> +
> >>   static signed long i915_fence_wait(struct dma_fence *fence,
> >> bool interruptible,
> >> signed long timeout)
> >> @@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = {
> >>  .signaled = i915_fence_signaled,
> >>  .wait = i915_fence_wait,
> >>  .release = i915_fence_release,
> >> +.set_deadline = i915_fence_set_deadline,
> >>   };
> >>
> >>   static void irq_execute_cb(struct irq_work *wrk)
> >> --
> >> 2.39.1
> >>

[Intel-gfx] [PATCH v9 15/15] drm/i915: Add deadline based boost support

2023-03-02 Thread Rob Clark

From: Rob Clark 

v2: rebase

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/i915_request.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 7503dcb9043b..44491e7e214c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence 
*fence)
return i915_request_enable_breadcrumb(to_request(fence));
 }
 
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+   struct i915_request *rq = to_request(fence);
+
+   if (i915_request_completed(rq))
+   return;
+
+   if (i915_request_started(rq))
+   return;
+
+   /*
+* TODO something more clever for deadlines that are in the
+* future.  I think probably track the nearest deadline in
+* rq->timeline and set timer to trigger boost accordingly?
+*/
+
+   intel_rps_boost(rq);
+}
+
 static signed long i915_fence_wait(struct dma_fence *fence,
   bool interruptible,
   signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = {
.signaled = i915_fence_signaled,
.wait = i915_fence_wait,
.release = i915_fence_release,
+   .set_deadline = i915_fence_set_deadline,
 };
 
 static void irq_execute_cb(struct irq_work *wrk)
-- 
2.39.1

[Intel-gfx] [PATCH v9 00/15] dma-fence: Deadline awareness

2023-03-02 Thread Rob Clark

From: Rob Clark 

This series adds a deadline hint to fences, so realtime deadlines
such as vblank can be communicated to the fence signaller for power/
frequency management decisions.

This is partially inspired by a trick i915 does, but implemented
via dma-fence for a couple of reasons:

1) To continue to be able to use the atomic helpers
2) To support cases where display and gpu are different drivers

This iteration adds a dma-fence ioctl to set a deadline (both to
support igt-tests, and compositors which delay decisions about which
client buffer to display), and a sw_sync ioctl to read back the
deadline.  IGT tests utilizing these can be found at:

  https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline


v1: https://patchwork.freedesktop.org/series/93035/
v2: Move filtering out of later deadlines to fence implementation
to avoid increasing the size of dma_fence
v3: Add support in fence-array and fence-chain; Add some uabi to
support igt tests and userspace compositors.
v4: Rebase, address various comments, and add syncobj deadline
support, and sync_file EPOLLPRI based on experience with perf/
freq issues with clvk compute workloads on i915 (anv)
v5: Clarify that this is a hint as opposed to a more hard deadline
guarantee, switch to using u64 ns values in UABI (still absolute
CLOCK_MONOTONIC values), drop syncobj related cap and driver
feature flag in favor of allowing count_handles==0 for probing
kernel support.
v6: Re-work vblank helper to calculate time of _start_ of vblank,
and work correctly if the last vblank event was more than a
frame ago.  Add (mostly unrelated) drm/msm patch which also
uses the vblank helper.  Use dma_fence_chain_contained().  More
verbose syncobj UABI comments.  Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
v7: Fix kbuild complaints about vblank helper.  Add more docs.
v8: Add patch to surface sync_file UAPI, and more docs updates.
v9: Drop (E)POLLPRI support.. I still like it, but not essential and
it can always be revived later.  Fix doc build warning.

Rob Clark (15):
  dma-buf/dma-fence: Add deadline awareness
  dma-buf/fence-array: Add fence deadline support
  dma-buf/fence-chain: Add fence deadline support
  dma-buf/dma-resv: Add a way to set fence deadline
  dma-buf/sync_file: Surface sync-file uABI
  dma-buf/sync_file: Add SET_DEADLINE ioctl
  dma-buf/sw_sync: Add fence deadline support
  drm/scheduler: Add fence deadline support
  drm/syncobj: Add deadline support for syncobj waits
  drm/vblank: Add helper to get next vblank time
  drm/atomic-helper: Set fence deadline for vblank
  drm/msm: Add deadline based boost support
  drm/msm: Add wait-boost support
  drm/msm/atomic: Switch to vblank_start helper
  drm/i915: Add deadline based boost support

 Documentation/driver-api/dma-buf.rst| 16 -
 drivers/dma-buf/dma-fence-array.c   | 11 
 drivers/dma-buf/dma-fence-chain.c   | 12 
 drivers/dma-buf/dma-fence.c | 60 ++
 drivers/dma-buf/dma-resv.c  | 22 +++
 drivers/dma-buf/sw_sync.c   | 81 +
 drivers/dma-buf/sync_debug.h|  2 +
 drivers/dma-buf/sync_file.c | 19 ++
 drivers/gpu/drm/drm_atomic_helper.c | 36 +++
 drivers/gpu/drm/drm_syncobj.c   | 64 +++
 drivers/gpu/drm/drm_vblank.c| 53 +---
 drivers/gpu/drm/i915/i915_request.c | 20 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 -
 drivers/gpu/drm/msm/msm_atomic.c|  8 ++-
 drivers/gpu/drm/msm/msm_drv.c   | 12 ++--
 drivers/gpu/drm/msm/msm_fence.c | 74 ++
 drivers/gpu/drm/msm/msm_fence.h | 20 ++
 drivers/gpu/drm/msm/msm_gem.c   |  5 ++
 drivers/gpu/drm/msm/msm_kms.h   |  8 ---
 drivers/gpu/drm/scheduler/sched_fence.c | 46 ++
 drivers/gpu/drm/scheduler/sched_main.c  |  2 +-
 include/drm/drm_vblank.h|  1 +
 include/drm/gpu_scheduler.h | 17 ++
 include/linux/dma-fence.h   | 22 +++
 include/linux/dma-resv.h|  2 +
 include/uapi/drm/drm.h  | 17 ++
 include/uapi/drm/msm_drm.h  | 14 -
 include/uapi/linux/sync_file.h  | 59 +++---
 28 files changed, 639 insertions(+), 79 deletions(-)

-- 
2.39.1

[Intel-gfx] [PATCH v8 16/16] drm/i915: Add deadline based boost support

2023-02-28 Thread Rob Clark

From: Rob Clark 

v2: rebase

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/i915_request.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 7503dcb9043b..44491e7e214c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence 
*fence)
return i915_request_enable_breadcrumb(to_request(fence));
 }
 
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+   struct i915_request *rq = to_request(fence);
+
+   if (i915_request_completed(rq))
+   return;
+
+   if (i915_request_started(rq))
+   return;
+
+   /*
+* TODO something more clever for deadlines that are in the
+* future.  I think probably track the nearest deadline in
+* rq->timeline and set timer to trigger boost accordingly?
+*/
+
+   intel_rps_boost(rq);
+}
+
 static signed long i915_fence_wait(struct dma_fence *fence,
   bool interruptible,
   signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = {
.signaled = i915_fence_signaled,
.wait = i915_fence_wait,
.release = i915_fence_release,
+   .set_deadline = i915_fence_set_deadline,
 };
 
 static void irq_execute_cb(struct irq_work *wrk)
-- 
2.39.1

[Intel-gfx] [PATCH v8 00/16] dma-fence: Deadline awareness

2023-02-28 Thread Rob Clark

From: Rob Clark 

This series adds a deadline hint to fences, so realtime deadlines
such as vblank can be communicated to the fence signaller for power/
frequency management decisions.

This is partially inspired by a trick i915 does, but implemented
via dma-fence for a couple of reasons:

1) To continue to be able to use the atomic helpers
2) To support cases where display and gpu are different drivers

This iteration adds a dma-fence ioctl to set a deadline (both to
support igt-tests, and compositors which delay decisions about which
client buffer to display), and a sw_sync ioctl to read back the
deadline.  IGT tests utilizing these can be found at:

  https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline


v1: https://patchwork.freedesktop.org/series/93035/
v2: Move filtering out of later deadlines to fence implementation
to avoid increasing the size of dma_fence
v3: Add support in fence-array and fence-chain; Add some uabi to
support igt tests and userspace compositors.
v4: Rebase, address various comments, and add syncobj deadline
support, and sync_file EPOLLPRI based on experience with perf/
freq issues with clvk compute workloads on i915 (anv)
v5: Clarify that this is a hint as opposed to a more hard deadline
guarantee, switch to using u64 ns values in UABI (still absolute
CLOCK_MONOTONIC values), drop syncobj related cap and driver
feature flag in favor of allowing count_handles==0 for probing
kernel support.
v6: Re-work vblank helper to calculate time of _start_ of vblank,
and work correctly if the last vblank event was more than a
frame ago.  Add (mostly unrelated) drm/msm patch which also
uses the vblank helper.  Use dma_fence_chain_contained().  More
verbose syncobj UABI comments.  Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
v7: Fix kbuild complaints about vblank helper.  Add more docs.
v8: Add patch to surface sync_file UAPI, and more docs updates.

Rob Clark (16):
  dma-buf/dma-fence: Add deadline awareness
  dma-buf/fence-array: Add fence deadline support
  dma-buf/fence-chain: Add fence deadline support
  dma-buf/dma-resv: Add a way to set fence deadline
  dma-buf/sync_file: Surface sync-file uABI
  dma-buf/sync_file: Add SET_DEADLINE ioctl
  dma-buf/sync_file: Support (E)POLLPRI
  dma-buf/sw_sync: Add fence deadline support
  drm/scheduler: Add fence deadline support
  drm/syncobj: Add deadline support for syncobj waits
  drm/vblank: Add helper to get next vblank time
  drm/atomic-helper: Set fence deadline for vblank
  drm/msm: Add deadline based boost support
  drm/msm: Add wait-boost support
  drm/msm/atomic: Switch to vblank_start helper
  drm/i915: Add deadline based boost support

 Documentation/driver-api/dma-buf.rst| 16 -
 drivers/dma-buf/dma-fence-array.c   | 11 
 drivers/dma-buf/dma-fence-chain.c   | 12 
 drivers/dma-buf/dma-fence.c | 60 ++
 drivers/dma-buf/dma-resv.c  | 22 +++
 drivers/dma-buf/sw_sync.c   | 81 +
 drivers/dma-buf/sync_debug.h|  2 +
 drivers/dma-buf/sync_file.c | 27 +
 drivers/gpu/drm/drm_atomic_helper.c | 36 +++
 drivers/gpu/drm/drm_syncobj.c   | 64 +++
 drivers/gpu/drm/drm_vblank.c| 53 +---
 drivers/gpu/drm/i915/i915_request.c | 20 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 -
 drivers/gpu/drm/msm/msm_atomic.c|  8 ++-
 drivers/gpu/drm/msm/msm_drv.c   | 12 ++--
 drivers/gpu/drm/msm/msm_fence.c | 74 ++
 drivers/gpu/drm/msm/msm_fence.h | 20 ++
 drivers/gpu/drm/msm/msm_gem.c   |  5 ++
 drivers/gpu/drm/msm/msm_kms.h   |  8 ---
 drivers/gpu/drm/scheduler/sched_fence.c | 46 ++
 drivers/gpu/drm/scheduler/sched_main.c  |  2 +-
 include/drm/drm_vblank.h|  1 +
 include/drm/gpu_scheduler.h | 17 ++
 include/linux/dma-fence.h   | 22 +++
 include/linux/dma-resv.h|  2 +
 include/uapi/drm/drm.h  | 17 ++
 include/uapi/drm/msm_drm.h  | 14 -
 include/uapi/linux/sync_file.h  | 57 ++---
 28 files changed, 646 insertions(+), 78 deletions(-)

-- 
2.39.1

Re: [Intel-gfx] [PATCH v7 00/15] dma-fence: Deadline awareness

2023-02-28 Thread Rob Clark

On Tue, Feb 28, 2023 at 4:43 AM Bagas Sanjaya  wrote:
>
> On Mon, Feb 27, 2023 at 11:35:06AM -0800, Rob Clark wrote:
> > From: Rob Clark 
> >
> > This series adds a deadline hint to fences, so realtime deadlines
> > such as vblank can be communicated to the fence signaller for power/
> > frequency management decisions.
> >
> > This is partially inspired by a trick i915 does, but implemented
> > via dma-fence for a couple of reasons:
> >
> > 1) To continue to be able to use the atomic helpers
> > 2) To support cases where display and gpu are different drivers
> >
> > This iteration adds a dma-fence ioctl to set a deadline (both to
> > support igt-tests, and compositors which delay decisions about which
> > client buffer to display), and a sw_sync ioctl to read back the
> > deadline.  IGT tests utilizing these can be found at:
> >
> >   
> > https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline
> >
> >
> > v1: https://patchwork.freedesktop.org/series/93035/
> > v2: Move filtering out of later deadlines to fence implementation
> > to avoid increasing the size of dma_fence
> > v3: Add support in fence-array and fence-chain; Add some uabi to
> > support igt tests and userspace compositors.
> > v4: Rebase, address various comments, and add syncobj deadline
> > support, and sync_file EPOLLPRI based on experience with perf/
> > freq issues with clvk compute workloads on i915 (anv)
> > v5: Clarify that this is a hint as opposed to a more hard deadline
> > guarantee, switch to using u64 ns values in UABI (still absolute
> > CLOCK_MONOTONIC values), drop syncobj related cap and driver
> > feature flag in favor of allowing count_handles==0 for probing
> > kernel support.
> > v6: Re-work vblank helper to calculate time of _start_ of vblank,
> > and work correctly if the last vblank event was more than a
> > frame ago.  Add (mostly unrelated) drm/msm patch which also
> > uses the vblank helper.  Use dma_fence_chain_contained().  More
> > verbose syncobj UABI comments.  Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
> > v7: Fix kbuild complaints about vblank helper.  Add more docs.
> >
>
> I want to apply this series for testing, but it can't be applied cleanly
> on current drm-misc tree. On what tree (and commit) is this series based
> on?

You can find my branch here:

https://gitlab.freedesktop.org/robclark/msm/-/commits/dma-fence/deadline

BR,
-R

[Intel-gfx] [PATCH v7 15/15] drm/i915: Add deadline based boost support

2023-02-27 Thread Rob Clark

From: Rob Clark 

v2: rebase

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/i915_request.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 7503dcb9043b..44491e7e214c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence 
*fence)
return i915_request_enable_breadcrumb(to_request(fence));
 }
 
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+   struct i915_request *rq = to_request(fence);
+
+   if (i915_request_completed(rq))
+   return;
+
+   if (i915_request_started(rq))
+   return;
+
+   /*
+* TODO something more clever for deadlines that are in the
+* future.  I think probably track the nearest deadline in
+* rq->timeline and set timer to trigger boost accordingly?
+*/
+
+   intel_rps_boost(rq);
+}
+
 static signed long i915_fence_wait(struct dma_fence *fence,
   bool interruptible,
   signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = {
.signaled = i915_fence_signaled,
.wait = i915_fence_wait,
.release = i915_fence_release,
+   .set_deadline = i915_fence_set_deadline,
 };
 
 static void irq_execute_cb(struct irq_work *wrk)
-- 
2.39.1

[Intel-gfx] [PATCH v7 00/15] dma-fence: Deadline awareness

2023-02-27 Thread Rob Clark

From: Rob Clark 

This series adds a deadline hint to fences, so realtime deadlines
such as vblank can be communicated to the fence signaller for power/
frequency management decisions.

This is partially inspired by a trick i915 does, but implemented
via dma-fence for a couple of reasons:

1) To continue to be able to use the atomic helpers
2) To support cases where display and gpu are different drivers

This iteration adds a dma-fence ioctl to set a deadline (both to
support igt-tests, and compositors which delay decisions about which
client buffer to display), and a sw_sync ioctl to read back the
deadline.  IGT tests utilizing these can be found at:

  https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline


v1: https://patchwork.freedesktop.org/series/93035/
v2: Move filtering out of later deadlines to fence implementation
to avoid increasing the size of dma_fence
v3: Add support in fence-array and fence-chain; Add some uabi to
support igt tests and userspace compositors.
v4: Rebase, address various comments, and add syncobj deadline
support, and sync_file EPOLLPRI based on experience with perf/
freq issues with clvk compute workloads on i915 (anv)
v5: Clarify that this is a hint as opposed to a more hard deadline
guarantee, switch to using u64 ns values in UABI (still absolute
CLOCK_MONOTONIC values), drop syncobj related cap and driver
feature flag in favor of allowing count_handles==0 for probing
kernel support.
v6: Re-work vblank helper to calculate time of _start_ of vblank,
and work correctly if the last vblank event was more than a
frame ago.  Add (mostly unrelated) drm/msm patch which also
uses the vblank helper.  Use dma_fence_chain_contained().  More
verbose syncobj UABI comments.  Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.
v7: Fix kbuild complaints about vblank helper.  Add more docs.

Rob Clark (15):
  dma-buf/dma-fence: Add deadline awareness
  dma-buf/fence-array: Add fence deadline support
  dma-buf/fence-chain: Add fence deadline support
  dma-buf/dma-resv: Add a way to set fence deadline
  dma-buf/sync_file: Add SET_DEADLINE ioctl
  dma-buf/sync_file: Support (E)POLLPRI
  dma-buf/sw_sync: Add fence deadline support
  drm/scheduler: Add fence deadline support
  drm/syncobj: Add deadline support for syncobj waits
  drm/vblank: Add helper to get next vblank time
  drm/atomic-helper: Set fence deadline for vblank
  drm/msm: Add deadline based boost support
  drm/msm: Add wait-boost support
  drm/msm/atomic: Switch to vblank_start helper
  drm/i915: Add deadline based boost support

 Documentation/driver-api/dma-buf.rst|  6 ++
 drivers/dma-buf/dma-fence-array.c   | 11 
 drivers/dma-buf/dma-fence-chain.c   | 12 
 drivers/dma-buf/dma-fence.c | 60 
 drivers/dma-buf/dma-resv.c  | 22 
 drivers/dma-buf/sw_sync.c   | 58 +++
 drivers/dma-buf/sync_debug.h|  2 +
 drivers/dma-buf/sync_file.c | 27 +
 drivers/gpu/drm/drm_atomic_helper.c | 36 
 drivers/gpu/drm/drm_syncobj.c   | 64 -
 drivers/gpu/drm/drm_vblank.c| 53 +++---
 drivers/gpu/drm/i915/i915_request.c | 20 +++
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 -
 drivers/gpu/drm/msm/msm_atomic.c|  8 ++-
 drivers/gpu/drm/msm/msm_drv.c   | 12 ++--
 drivers/gpu/drm/msm/msm_fence.c | 74 +
 drivers/gpu/drm/msm/msm_fence.h | 20 +++
 drivers/gpu/drm/msm/msm_gem.c   |  5 ++
 drivers/gpu/drm/msm/msm_kms.h   |  8 ---
 drivers/gpu/drm/scheduler/sched_fence.c | 46 +++
 drivers/gpu/drm/scheduler/sched_main.c  |  2 +-
 include/drm/drm_vblank.h|  1 +
 include/drm/gpu_scheduler.h | 17 ++
 include/linux/dma-fence.h   | 20 +++
 include/linux/dma-resv.h|  2 +
 include/uapi/drm/drm.h  | 17 ++
 include/uapi/drm/msm_drm.h  | 14 -
 include/uapi/linux/sync_file.h  | 26 +
 28 files changed, 603 insertions(+), 55 deletions(-)

-- 
2.39.1

[Intel-gfx] [PATCH v6 15/15] drm/i915: Add deadline based boost support

2023-02-24 Thread Rob Clark

From: Rob Clark 

v2: rebase

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/i915_request.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 7503dcb9043b..44491e7e214c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence 
*fence)
return i915_request_enable_breadcrumb(to_request(fence));
 }
 
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+   struct i915_request *rq = to_request(fence);
+
+   if (i915_request_completed(rq))
+   return;
+
+   if (i915_request_started(rq))
+   return;
+
+   /*
+* TODO something more clever for deadlines that are in the
+* future.  I think probably track the nearest deadline in
+* rq->timeline and set timer to trigger boost accordingly?
+*/
+
+   intel_rps_boost(rq);
+}
+
 static signed long i915_fence_wait(struct dma_fence *fence,
   bool interruptible,
   signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = {
.signaled = i915_fence_signaled,
.wait = i915_fence_wait,
.release = i915_fence_release,
+   .set_deadline = i915_fence_set_deadline,
 };
 
 static void irq_execute_cb(struct irq_work *wrk)
-- 
2.39.1

[Intel-gfx] [PATCH v6 00/15] dma-fence: Deadline awareness

2023-02-24 Thread Rob Clark

From: Rob Clark 


This series adds a deadline hint to fences, so realtime deadlines
such as vblank can be communicated to the fence signaller for power/
frequency management decisions.

This is partially inspired by a trick i915 does, but implemented
via dma-fence for a couple of reasons:

1) To continue to be able to use the atomic helpers
2) To support cases where display and gpu are different drivers

This iteration adds a dma-fence ioctl to set a deadline (both to
support igt-tests, and compositors which delay decisions about which
client buffer to display), and a sw_sync ioctl to read back the
deadline.  IGT tests utilizing these can be found at:

  https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline


v1: https://patchwork.freedesktop.org/series/93035/
v2: Move filtering out of later deadlines to fence implementation
to avoid increasing the size of dma_fence
v3: Add support in fence-array and fence-chain; Add some uabi to
support igt tests and userspace compositors.
v4: Rebase, address various comments, and add syncobj deadline
support, and sync_file EPOLLPRI based on experience with perf/
freq issues with clvk compute workloads on i915 (anv)
v5: Clarify that this is a hint as opposed to a more hard deadline
guarantee, switch to using u64 ns values in UABI (still absolute
CLOCK_MONOTONIC values), drop syncobj related cap and driver
feature flag in favor of allowing count_handles==0 for probing
kernel support.
v6: Re-work vblank helper to calculate time of _start_ of vblank,
and work correctly if the last vblank event was more than a
frame ago.  Add (mostly unrelated) drm/msm patch which also
uses the vblank helper.  Use dma_fence_chain_contained().  More
verbose syncobj UABI comments.  Drop DMA_FENCE_FLAG_HAS_DEADLINE_BIT.

Rob Clark (15):
  dma-buf/dma-fence: Add deadline awareness
  dma-buf/fence-array: Add fence deadline support
  dma-buf/fence-chain: Add fence deadline support
  dma-buf/dma-resv: Add a way to set fence deadline
  dma-buf/sync_file: Add SET_DEADLINE ioctl
  dma-buf/sync_file: Support (E)POLLPRI
  dma-buf/sw_sync: Add fence deadline support
  drm/scheduler: Add fence deadline support
  drm/syncobj: Add deadline support for syncobj waits
  drm/vblank: Add helper to get next vblank time
  drm/atomic-helper: Set fence deadline for vblank
  drm/msm: Add deadline based boost support
  drm/msm: Add wait-boost support
  drm/msm/atomic: Switch to vblank_start helper
  drm/i915: Add deadline based boost support

 drivers/dma-buf/dma-fence-array.c   | 11 
 drivers/dma-buf/dma-fence-chain.c   | 12 
 drivers/dma-buf/dma-fence.c | 20 +++
 drivers/dma-buf/dma-resv.c  | 22 
 drivers/dma-buf/sw_sync.c   | 58 +++
 drivers/dma-buf/sync_debug.h|  2 +
 drivers/dma-buf/sync_file.c | 27 +
 drivers/gpu/drm/drm_atomic_helper.c | 36 
 drivers/gpu/drm/drm_syncobj.c   | 64 -
 drivers/gpu/drm/drm_vblank.c| 52 ++---
 drivers/gpu/drm/i915/i915_request.c | 20 +++
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 15 -
 drivers/gpu/drm/msm/msm_atomic.c|  8 ++-
 drivers/gpu/drm/msm/msm_drv.c   | 12 ++--
 drivers/gpu/drm/msm/msm_fence.c | 74 +
 drivers/gpu/drm/msm/msm_fence.h | 20 +++
 drivers/gpu/drm/msm/msm_gem.c   |  5 ++
 drivers/gpu/drm/msm/msm_kms.h   |  8 ---
 drivers/gpu/drm/scheduler/sched_fence.c | 46 +++
 drivers/gpu/drm/scheduler/sched_main.c  |  2 +-
 include/drm/drm_vblank.h|  1 +
 include/drm/gpu_scheduler.h | 17 ++
 include/linux/dma-fence.h   | 19 +++
 include/linux/dma-resv.h|  2 +
 include/uapi/drm/drm.h  | 17 ++
 include/uapi/drm/msm_drm.h  | 14 -
 include/uapi/linux/sync_file.h  | 26 +
 27 files changed, 555 insertions(+), 55 deletions(-)

-- 
2.39.1

Re: [Intel-gfx] [PATCH] drm/atomic-helpers: remove legacy_cursor_update hacks

2023-02-22 Thread Rob Clark

On Wed, Feb 22, 2023 at 3:14 PM Rob Clark  wrote:
>
> On Thu, Feb 16, 2023 at 3:12 AM Daniel Vetter  wrote:
> >
> > The stuff never really worked, and leads to lots of fun because it
> > out-of-order frees atomic states. Which upsets KASAN, among other
> > things.
> >
> > For async updates we now have a more solid solution with the
> > ->atomic_async_check and ->atomic_async_commit hooks. Support for that
> > for msm and vc4 landed. nouveau and i915 have their own commit
> > routines, doing something similar.
> >
> > For everyone else it's probably better to remove the use-after-free
> > bug, and encourage folks to use the async support instead. The
> > affected drivers which register a legacy cursor plane and don't either
> > use the new async stuff or their own commit routine are: amdgpu,
> > atmel, mediatek, qxl, rockchip, sti, sun4i, tegra, virtio, and vmwgfx.
> >
> > Inspired by an amdgpu bug report.
> >
> > v2: Drop RFC, I think with amdgpu converted over to use
> > atomic_async_check/commit done in
> >
> > commit 674e78acae0dfb4beb56132e41cbae5b60f7d662
> > Author: Nicholas Kazlauskas 
> > Date:   Wed Dec 5 14:59:07 2018 -0500
> >
> > drm/amd/display: Add fast path for cursor plane updates
> >
> > we don't have any driver anymore where we have userspace expecting
> > solid legacy cursor support _and_ they are using the atomic helpers in
> > their fully glory. So we can retire this.
> >
> > v3: Paper over msm and i915 regression. The complete_all is the only
> > thing missing afaict.
> >
> > v4: Fixup i915 fixup ...
> >
> > v5: Unallocate the crtc->event in msm to avoid hitting a WARN_ON in
> > dpu_crtc_atomic_flush(). This is a bit a hack, but simplest way to
> > untangle this all. Thanks to Abhinav Kumar for the debug help.
>
> Hmm, are you sure about that double-put?
>
> [  +0.501263] [ cut here ]
> [  +0.32] refcount_t: underflow; use-after-free.
> [  +0.33] WARNING: CPU: 6 PID: 1854 at lib/refcount.c:28
> refcount_warn_saturate+0xf8/0x134
> [  +0.43] Modules linked in: uinput rfcomm algif_hash
> algif_skcipher af_alg veth venus_dec venus_enc xt_cgroup xt_MASQUERADE
> qcom_spmi_temp_alarm qcom_spmi_adc_tm5 qcom_spmi_adc5 qcom_vadc_common
> cros_ec_typec typec 8021q hci_uart btqca qcom_stats venus_core
> coresight_etm4x coresight_tmc snd_soc_lpass_sc7180
> coresight_replicator coresight_funnel coresight snd_soc_sc7180
> ip6table_nat fuse ath10k_snoc ath10k_core ath mac80211 iio_trig_sysfs
> bluetooth cros_ec_sensors cfg80211 cros_ec_sensors_core
> industrialio_triggered_buffer kfifo_buf ecdh_generic ecc
> cros_ec_sensorhub lzo_rle lzo_compress r8153_ecm cdc_ether usbnet
> r8152 mii zram hid_vivaldi hid_google_hammer hid_vivaldi_common joydev
> [  +0.000189] CPU: 6 PID: 1854 Comm: DrmThread Not tainted
> 5.15.93-16271-g5ecce40dbcd4 #46
> cf9752a1c9e5b13fd13216094f52d77fa5a5f8f3
> [  +0.16] Hardware name: Google Wormdingler rev1+ INX panel board (DT)
> [  +0.08] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  +0.13] pc : refcount_warn_saturate+0xf8/0x134
> [  +0.11] lr : refcount_warn_saturate+0xf8/0x134
> [  +0.11] sp : ffc012e43930
> [  +0.08] x29: ffc012e43930 x28: ff80d31aa300 x27: 
> 024e
> [  +0.17] x26: 03bd x25: 0040 x24: 
> 0040
> [  +0.14] x23: ff8083eb1000 x22: 0002 x21: 
> ff80845bc800
> [  +0.13] x20: 0040 x19: ff80d0cecb00 x18: 
> 60014024
> [  +0.12] x17:  x16: 003c x15: 
> ffd97e21a1c0
> [  +0.12] x14: 0003 x13: 0004 x12: 
> 0001
> [  +0.14] x11: c000dfff x10: ffd97f560f50 x9 : 
> 5749cdb403550d00
> [  +0.14] x8 : 5749cdb403550d00 x7 :  x6 : 
> 372e31332020205b
> [  +0.12] x5 : ffd97f7b8b24 x4 :  x3 : 
> ffc012e43588
> [  +0.13] x2 : ffc012e43590 x1 : dfff x0 : 
> 0026
> [  +0.14] Call trace:
> [  +0.08]  refcount_warn_saturate+0xf8/0x134
> [  +0.13]  drm_crtc_commit_put+0x54/0x74
> [  +0.13]  __drm_atomic_helper_plane_destroy_state+0x64/0x68
> [  +0.13]  dpu_plane_destroy_state+0x24/0x3c
> [  +0.17]  drm_atomic_state_default_clear+0x13c/0x2d8
> [  +0.15]  __drm_atomic_state_free+0x88/0xa0
> [  +0.15]  drm_atomic_helper_update_plane+0x158/0x188
> [  +0.14]  __setplane_atomic+0xf4/0x138
> [  +0.12]  drm_mode_cursor_common+0x2e8/0x40c
> [  +0.09]  drm_mode_cur

Re: [Intel-gfx] [PATCH] drm/atomic-helpers: remove legacy_cursor_update hacks

2023-02-22 Thread Rob Clark

 References: 
> https://lore.kernel.org/all/20220221134155.125447-9-max...@cerno.tech/
> References: https://bugzilla.kernel.org/show_bug.cgi?id=199425
> Cc: Maxime Ripard 
> Tested-by: Maxime Ripard 
> Cc: mikita.lip...@amd.com
> Cc: Michel Dänzer 
> Cc: harry.wentl...@amd.com
> Cc: Rob Clark 
> Cc: "Kazlauskas, Nicholas" 
> Cc: Dmitry Osipenko 
> Cc: Maarten Lankhorst 
> Cc: Dmitry Baryshkov 
> Cc: Sean Paul 
> Cc: Matthias Brugger 
> Cc: AngeloGioacchino Del Regno 
> Cc: "Ville Syrjälä" 
> Cc: Jani Nikula 
> Cc: Lucas De Marchi 
> Cc: Imre Deak 
> Cc: Manasi Navare 
> Cc: linux-arm-...@vger.kernel.org
> Cc: freedr...@lists.freedesktop.org
> Cc: linux-ker...@vger.kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-media...@lists.infradead.org
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/drm_atomic_helper.c  | 13 -
>  drivers/gpu/drm/i915/display/intel_display.c | 14 ++
>  drivers/gpu/drm/msm/msm_atomic.c | 15 +++
>  3 files changed, 29 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
> b/drivers/gpu/drm/drm_atomic_helper.c
> index d579fd8f7cb8..f6b4c3a00684 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1587,13 +1587,6 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device 
> *dev,
> int i, ret;
> unsigned int crtc_mask = 0;
>
> -/*
> - * Legacy cursor ioctls are completely unsynced, and userspace
> - * relies on that (by doing tons of cursor updates).
> - */
> -   if (old_state->legacy_cursor_update)
> -   return;
> -
> for_each_oldnew_crtc_in_state(old_state, crtc, old_crtc_state, 
> new_crtc_state, i) {
> if (!new_crtc_state->active)
> continue;
> @@ -2244,12 +2237,6 @@ int drm_atomic_helper_setup_commit(struct 
> drm_atomic_state *state,
> continue;
> }
>
> -   /* Legacy cursor updates are fully unsynced. */
> -   if (state->legacy_cursor_update) {
> -   complete_all(>flip_done);
> -   continue;
> -   }
> -
> if (!new_crtc_state->event) {
> commit->event = kzalloc(sizeof(*commit->event),
> GFP_KERNEL);
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
> b/drivers/gpu/drm/i915/display/intel_display.c
> index 3479125fbda6..2454451fcf95 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -7651,6 +7651,20 @@ static int intel_atomic_commit(struct drm_device *dev,
> intel_runtime_pm_put(_priv->runtime_pm, state->wakeref);
> return ret;
> }
> +
> +   /*
> +* FIXME: Cut over to (async) commit helpers instead of hand-rolling
> +* everything.
> +*/
> +   if (state->base.legacy_cursor_update) {
> +   struct intel_crtc_state *new_crtc_state;
> +   struct intel_crtc *crtc;
> +   int i;
> +
> +   for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, 
> i)
> +   complete_all(_crtc_state->uapi.commit->flip_done);
> +   }
> +
> intel_shared_dpll_swap_state(state);
> intel_atomic_track_fbs(state);
>
> diff --git a/drivers/gpu/drm/msm/msm_atomic.c 
> b/drivers/gpu/drm/msm/msm_atomic.c
> index 1686fbb611fd..b7151767b567 100644
> --- a/drivers/gpu/drm/msm/msm_atomic.c
> +++ b/drivers/gpu/drm/msm/msm_atomic.c
> @@ -189,6 +189,19 @@ void msm_atomic_commit_tail(struct drm_atomic_state 
> *state)
> bool async = kms->funcs->vsync_time &&
> can_do_async(state, _crtc);
>
> +   /*
> +* FIXME: Convert to async plane helpers and remove the various hacks 
> to
> +* keep the old legacy_cursor_way of doing async commits working for 
> the
> +* dpu code, like the expectation that these don't have a crtc->event.
> +*/
> +   if (async) {
> +   /* both ->event itself and the pointer hold a reference! */
> +   drm_crtc_commit_put(async_crtc->state->commit);
> +   drm_crtc_commit_put(async_crtc->state->commit);
> +   kfree(async_crtc->state->event);
> +   async_crtc->state->event = NULL;
> +   }
> +
> trace_msm_atomic_commit_tail_start(async, crtc_mask);
>
> kms->funcs->enable_commit(kms);
> @@ -222,6 +235,8 @@ void msm_atomic_commit_tail(struct drm_atomic_state 
> *state)
> /* async updates are limited to single-crtc updates: */
> WARN_ON(crtc_mask != drm_crtc_mask(async_crtc));
>
> +   complete_all(_crtc->state->commit->flip_done);
> +
> /*
>  * Start timer if we don't already have an update pending
>  * on this crtc:
> --
> 2.39.0
>

[Intel-gfx] [PATCH v5 14/14] drm/i915: Add deadline based boost support

2023-02-20 Thread Rob Clark

From: Rob Clark 

v2: rebase

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/i915_request.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 7503dcb9043b..44491e7e214c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence 
*fence)
return i915_request_enable_breadcrumb(to_request(fence));
 }
 
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+   struct i915_request *rq = to_request(fence);
+
+   if (i915_request_completed(rq))
+   return;
+
+   if (i915_request_started(rq))
+   return;
+
+   /*
+* TODO something more clever for deadlines that are in the
+* future.  I think probably track the nearest deadline in
+* rq->timeline and set timer to trigger boost accordingly?
+*/
+
+   intel_rps_boost(rq);
+}
+
 static signed long i915_fence_wait(struct dma_fence *fence,
   bool interruptible,
   signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = {
.signaled = i915_fence_signaled,
.wait = i915_fence_wait,
.release = i915_fence_release,
+   .set_deadline = i915_fence_set_deadline,
 };
 
 static void irq_execute_cb(struct irq_work *wrk)
-- 
2.39.1

[Intel-gfx] [PATCH v5 00/14] dma-fence: Deadline awareness

2023-02-20 Thread Rob Clark

From: Rob Clark 

This series adds a deadline hint to fences, so realtime deadlines
such as vblank can be communicated to the fence signaller for power/
frequency management decisions.

This is partially inspired by a trick i915 does, but implemented
via dma-fence for a couple of reasons:

1) To continue to be able to use the atomic helpers
2) To support cases where display and gpu are different drivers

This iteration adds a dma-fence ioctl to set a deadline (both to
support igt-tests, and compositors which delay decisions about which
client buffer to display), and a sw_sync ioctl to read back the
deadline.  IGT tests utilizing these can be found at:

  https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline


v1: https://patchwork.freedesktop.org/series/93035/
v2: Move filtering out of later deadlines to fence implementation
to avoid increasing the size of dma_fence
v3: Add support in fence-array and fence-chain; Add some uabi to
support igt tests and userspace compositors.
v4: Rebase, address various comments, and add syncobj deadline
support, and sync_file EPOLLPRI based on experience with perf/
freq issues with clvk compute workloads on i915 (anv)
v5: Clarify that this is a hint as opposed to a more hard deadline
guarantee, switch to using u64 ns values in UABI (still absolute
CLOCK_MONOTONIC values), drop syncobj related cap and driver
feature flag in favor of allowing count_handles==0 for probing
kernel support.

Rob Clark (14):
  dma-buf/dma-fence: Add deadline awareness
  dma-buf/fence-array: Add fence deadline support
  dma-buf/fence-chain: Add fence deadline support
  dma-buf/dma-resv: Add a way to set fence deadline
  dma-buf/sync_file: Add SET_DEADLINE ioctl
  dma-buf/sync_file: Support (E)POLLPRI
  dma-buf/sw_sync: Add fence deadline support
  drm/scheduler: Add fence deadline support
  drm/syncobj: Add deadline support for syncobj waits
  drm/vblank: Add helper to get next vblank time
  drm/atomic-helper: Set fence deadline for vblank
  drm/msm: Add deadline based boost support
  drm/msm: Add wait-boost support
  drm/i915: Add deadline based boost support

 drivers/dma-buf/dma-fence-array.c   | 11 
 drivers/dma-buf/dma-fence-chain.c   | 13 +
 drivers/dma-buf/dma-fence.c | 21 +++
 drivers/dma-buf/dma-resv.c  | 22 
 drivers/dma-buf/sw_sync.c   | 58 +++
 drivers/dma-buf/sync_debug.h|  2 +
 drivers/dma-buf/sync_file.c | 27 +
 drivers/gpu/drm/drm_atomic_helper.c | 36 
 drivers/gpu/drm/drm_syncobj.c   | 59 +++-
 drivers/gpu/drm/drm_vblank.c| 32 +++
 drivers/gpu/drm/i915/i915_request.c | 20 +++
 drivers/gpu/drm/msm/msm_drv.c   | 12 ++--
 drivers/gpu/drm/msm/msm_fence.c | 74 +
 drivers/gpu/drm/msm/msm_fence.h | 20 +++
 drivers/gpu/drm/msm/msm_gem.c   |  5 ++
 drivers/gpu/drm/scheduler/sched_fence.c | 46 +++
 drivers/gpu/drm/scheduler/sched_main.c  |  2 +-
 include/drm/drm_vblank.h|  1 +
 include/drm/gpu_scheduler.h |  8 +++
 include/linux/dma-fence.h   | 20 +++
 include/linux/dma-resv.h|  2 +
 include/uapi/drm/drm.h  |  5 ++
 include/uapi/drm/msm_drm.h  | 14 -
 include/uapi/linux/sync_file.h  | 23 
 24 files changed, 513 insertions(+), 20 deletions(-)

-- 
2.39.1

Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

2023-02-20 Thread Rob Clark

On Mon, Feb 20, 2023 at 8:51 AM Tvrtko Ursulin
 wrote:
>
>
> On 20/02/2023 16:44, Tvrtko Ursulin wrote:
> >
> > On 20/02/2023 15:52, Rob Clark wrote:
> >> On Mon, Feb 20, 2023 at 3:33 AM Tvrtko Ursulin
> >>  wrote:
> >>>
> >>>
> >>> On 17/02/2023 20:45, Rodrigo Vivi wrote:
> >
> > [snip]
> >
> >>> Yeah I agree. And as not all media use cases are the same, as are not
> >>> all compute contexts someone somewhere will need to run a series of
> >>> workloads for power and performance numbers. Ideally that someone would
> >>> be the entity for which it makes sense to look at all use cases, from
> >>> server room to client, 3d, media and compute for both. If we could get
> >>> the capability to run this in some automated fashion, akin to CI, we
> >>> would even have a chance to keep making good decisions in the future.
> >>>
> >>> Or we do some one off testing for this instance, but we still need a
> >>> range of workloads and parts to do it properly..
> >>>
> >>>>> I also think the "arms race" scenario isn't really as much of a
> >>>>> problem as you think.  There aren't _that_ many things using the GPU
> >>>>> at the same time (compared to # of things using CPU).   And a lot of
> >>>>> mobile games throttle framerate to avoid draining your battery too
> >>>>> quickly (after all, if your battery is dead you can't keep buying loot
> >>>>> boxes or whatever).
> >>>>
> >>>> Very good point.
> >>>
> >>> On this one I still disagree from the point of view that it does not
> >>> make it good uapi if we allow everyone to select themselves for priority
> >>> handling (one flavour or the other).
> >>
> >> There is plenty of precedent for userspace giving hints to the kernel
> >> about scheduling and freq mgmt.  Like schedutil uclamp stuff.
> >> Although I think that is all based on cgroups.
> >
> > I knew about SCHED_DEADLINE and that it requires CAP_SYS_NICE, but I did
> > not know about uclamp. Quick experiment with uclampset suggests it
> > indeed does not require elevated privilege. If that is indeed so, it is
> > good enough for me as a precedent.
> >
> > It appears to work using sched_setscheduler so maybe could define
> > something similar in i915/xe, per context or per client, not sure.
> >
> > Maybe it would start as a primitive implementation but the uapi would
> > not preclude making it smart(er) afterwards. Or passing along to GuC to
> > do it's thing with it.
>
> Hmmm having said that, how would we fix clvk performance using that? We
> would either need the library to do a new step when creating contexts,
> or allow external control so outside entity can do it. And then the
> question is based on what it decides to do it? Is it possible to know
> which, for instance, Chrome tab will be (or is) using clvk so that tab
> management code does it?

I am not sure.. the clvk usage is, I think, not actually in chrome
itself, but something camera related?

Presumably we could build some cgroup knobs to control how the driver
reacts to the "deadline" hints (ie. ignore them completely, or impose
some upper limit on how much freq boost will be applied, etc).  I
think this sort of control of how the driver responds to hints
probably fits best with cgroups, as that is how we are already
implementing similar tuning for cpufreq/sched.  (Ie. foreground app or
tab gets moved to a different cgroup.)  But admittedly I haven't
looked too closely at how cgroups work on the kernel side.

BR,
-R

> Regards,
>
> Tvrtko
>
> >> In the fence/syncobj case, I think we need per-wait hints.. because
> >> for a single process the driver will be doing both housekeeping waits
> >> and potentially urgent waits.  There may also be some room for some
> >> cgroup or similar knobs to control things like what max priority an
> >> app can ask for, and whether or how aggressively the kernel responds
> >> to the "deadline" hints.  So as far as "arms race", I don't think I'd
> >
> > Per wait hints are okay I guess even with "I am important" in their name
> > if sched_setscheduler allows raising uclamp.min just like that. In which
> > case cgroup limits to mimick cpu uclamp also make sense.
> >
> >> change anything about my "fence deadline" proposal.. but that it might
> >> just be one piece of the overall puzzle.
> >
> > That SCHED_DEADLINE requires CAP_SYS_NICE does not worry you?
> >
> > Regards,
> >
> > Tvrtko

Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

2023-02-20 Thread Rob Clark

On Mon, Feb 20, 2023 at 8:44 AM Tvrtko Ursulin
 wrote:
>
>
> On 20/02/2023 15:52, Rob Clark wrote:
> > On Mon, Feb 20, 2023 at 3:33 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 17/02/2023 20:45, Rodrigo Vivi wrote:
>
> [snip]
>
> >> Yeah I agree. And as not all media use cases are the same, as are not
> >> all compute contexts someone somewhere will need to run a series of
> >> workloads for power and performance numbers. Ideally that someone would
> >> be the entity for which it makes sense to look at all use cases, from
> >> server room to client, 3d, media and compute for both. If we could get
> >> the capability to run this in some automated fashion, akin to CI, we
> >> would even have a chance to keep making good decisions in the future.
> >>
> >> Or we do some one off testing for this instance, but we still need a
> >> range of workloads and parts to do it properly..
> >>
> >>>> I also think the "arms race" scenario isn't really as much of a
> >>>> problem as you think.  There aren't _that_ many things using the GPU
> >>>> at the same time (compared to # of things using CPU).   And a lot of
> >>>> mobile games throttle framerate to avoid draining your battery too
> >>>> quickly (after all, if your battery is dead you can't keep buying loot
> >>>> boxes or whatever).
> >>>
> >>> Very good point.
> >>
> >> On this one I still disagree from the point of view that it does not
> >> make it good uapi if we allow everyone to select themselves for priority
> >> handling (one flavour or the other).
> >
> > There is plenty of precedent for userspace giving hints to the kernel
> > about scheduling and freq mgmt.  Like schedutil uclamp stuff.
> > Although I think that is all based on cgroups.
>
> I knew about SCHED_DEADLINE and that it requires CAP_SYS_NICE, but I did
> not know about uclamp. Quick experiment with uclampset suggests it
> indeed does not require elevated privilege. If that is indeed so, it is
> good enough for me as a precedent.
>
> It appears to work using sched_setscheduler so maybe could define
> something similar in i915/xe, per context or per client, not sure.
>
> Maybe it would start as a primitive implementation but the uapi would
> not preclude making it smart(er) afterwards. Or passing along to GuC to
> do it's thing with it.
>
> > In the fence/syncobj case, I think we need per-wait hints.. because
> > for a single process the driver will be doing both housekeeping waits
> > and potentially urgent waits.  There may also be some room for some
> > cgroup or similar knobs to control things like what max priority an
> > app can ask for, and whether or how aggressively the kernel responds
> > to the "deadline" hints.  So as far as "arms race", I don't think I'd
>
> Per wait hints are okay I guess even with "I am important" in their name
> if sched_setscheduler allows raising uclamp.min just like that. In which
> case cgroup limits to mimick cpu uclamp also make sense.
>
> > change anything about my "fence deadline" proposal.. but that it might
> > just be one piece of the overall puzzle.
>
> That SCHED_DEADLINE requires CAP_SYS_NICE does not worry you?

This gets to why the name "fence deadline" is perhaps not the best..
it really isn't meant to be analogous to SCHED_DEADLINE, but rather
just a hint to the driver about what userspace is doing.  Maybe we
just document it more strongly as a hint?

BR,
-R

> Regards,
>
> Tvrtko

Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

2023-02-20 Thread Rob Clark

On Mon, Feb 20, 2023 at 3:33 AM Tvrtko Ursulin
 wrote:
>
>
> On 17/02/2023 20:45, Rodrigo Vivi wrote:
> > On Fri, Feb 17, 2023 at 09:00:49AM -0800, Rob Clark wrote:
> >> On Fri, Feb 17, 2023 at 8:03 AM Tvrtko Ursulin
> >>  wrote:
> >>>
> >>>
> >>> On 17/02/2023 14:55, Rob Clark wrote:
> >>>> On Fri, Feb 17, 2023 at 4:56 AM Tvrtko Ursulin
> >>>>  wrote:
> >>>>>
> >>>>>
> >>>>> On 16/02/2023 18:19, Rodrigo Vivi wrote:
> >>>>>> On Tue, Feb 14, 2023 at 11:14:00AM -0800, Rob Clark wrote:
> >>>>>>> On Fri, Feb 10, 2023 at 5:07 AM Tvrtko Ursulin
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>> From: Tvrtko Ursulin 
> >>>>>>>>
> >>>>>>>> In i915 we have this concept of "wait boosting" where we give a 
> >>>>>>>> priority boost
> >>>>>>>> for instance to fences which are actively waited upon from 
> >>>>>>>> userspace. This has
> >>>>>>>> it's pros and cons and can certainly be discussed at lenght. However 
> >>>>>>>> fact is
> >>>>>>>> some workloads really like it.
> >>>>>>>>
> >>>>>>>> Problem is that with the arrival of drm syncobj and a new userspace 
> >>>>>>>> waiting
> >>>>>>>> entry point it added, the waitboost mechanism was bypassed. Hence I 
> >>>>>>>> cooked up
> >>>>>>>> this mini series really (really) quickly to see if some discussion 
> >>>>>>>> can be had.
> >>>>>>>>
> >>>>>>>> It adds a concept of "wait count" to dma fence, which is incremented 
> >>>>>>>> for every
> >>>>>>>> explicit dma_fence_enable_sw_signaling and 
> >>>>>>>> dma_fence_add_wait_callback (like
> >>>>>>>> dma_fence_add_callback but from explicit/userspace wait paths).
> >>>>>>>
> >>>>>>> I was thinking about a similar thing, but in the context of dma_fence
> >>>>>>> (or rather sync_file) fd poll()ing.  How does the kernel differentiate
> >>>>>>> between "housekeeping" poll()ers that don't want to trigger boost but
> >>>>>>> simply know when to do cleanup, and waiters who are waiting with some
> >>>>>>> urgency.  I think we could use EPOLLPRI for this purpose.
> >>>>>>>
> >>>>>>> Not sure how that translates to waits via the syncobj.  But I think we
> >>>>>>> want to let userspace give some hint about urgent vs housekeeping
> >>>>>>> waits.
> >>>>>>
> >>>>>> Should the hint be on the waits, or should the hints be on the executed
> >>>>>> context?
> >>>>>>
> >>>>>> In the end we need some way to quickly ramp-up the frequency to avoid
> >>>>>> the execution bubbles.
> >>>>>>
> >>>>>> waitboost is trying to guess that, but in some cases it guess wrong
> >>>>>> and waste power.
> >>>>>
> >>>>> Do we have a list of workloads which shows who benefits and who loses
> >>>>> from the current implementation of waitboost?
> >>>>>> btw, this is something that other drivers might need:
> >>>>>>
> >>>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/1500#note_825883
> >>>>>> Cc: Alex Deucher 
> >>>>>
> >>>>> I have several issues with the context hint if it would directly
> >>>>> influence frequency selection in the "more power" direction.
> >>>>>
> >>>>> First of all, assume a context hint would replace the waitboost. Which
> >>>>> applications would need to set it to restore the lost performance and
> >>>>> how would they set it?
> >>>>>
> >>>>> Then I don't even think userspace necessarily knows. Think of a layer
> >>>>> like OpenCL. It doesn't really know in advance the profile of
> >>>>> submissions vs waits. It depends on

Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

2023-02-20 Thread Rob Clark

On Mon, Feb 20, 2023 at 4:22 AM Tvrtko Ursulin
 wrote:
>
>
> On 17/02/2023 17:00, Rob Clark wrote:
> > On Fri, Feb 17, 2023 at 8:03 AM Tvrtko Ursulin
> >  wrote:
>
> [snip]
>
> >>> adapted from your patches..  I think the basic idea of deadlines
> >>> (which includes "I want it NOW" ;-)) isn't controversial, but the
> >>> original idea got caught up in some bikeshed (what about compositors
> >>> that wait on fences in userspace to decide which surfaces to update in
> >>> the next frame), plus me getting busy and generally not having a good
> >>> plan for how to leverage this from VM guests (which is becoming
> >>> increasingly important for CrOS).  I think I can build on some ongoing
> >>> virtgpu fencing improvement work to solve the latter.  But now that we
> >>> have a 2nd use-case for this, it makes sense to respin.
> >>
> >> Sure, I was looking at the old version already. It is interesting. But
> >> also IMO needs quite a bit more work to approach achieving what is
> >> implied from the name of the feature. It would need proper deadline
> >> based sched job picking, and even then drm sched is mostly just a
> >> frontend. So once past runnable status and jobs handed over to backend,
> >> without further driver work it probably wouldn't be very effective past
> >> very lightly loaded systems.
> >
> > Yes, but all of that is not part of dma_fence ;-)
>
> :) Okay.
>
> Having said that, do we need a step back to think about whether adding
> deadline to dma-fences is not making them something too much different
> to what they were? Going from purely synchronisation primitive more
> towards scheduling paradigms. Just to brainstorm if there will not be
> any unintended consequences. I should mention this in your RFC thread
> actually.

Perhaps "deadline" isn't quite the right name, but I haven't thought
of anything better.  It is really a hint to the fence signaller about
how soon it is interested in a result so the driver can factor that
into freq scaling decisions.  Maybe "goal" or some other term would be
better?

I guess that can factor into scheduling decisions as well.. but we
already have priority for that.  My main interest is freq mgmt.

(Thankfully we don't have performance and efficiency cores to worry
about, like CPUs ;-))

> > A pretty common challenging usecase is still the single fullscreen
> > game, where scheduling isn't the problem, but landing at an
> > appropriate GPU freq absolutely is.  (UI workloads are perhaps more
> > interesting from a scheduler standpoint, but they generally aren't
> > challenging from a load/freq standpoint.)
>
> Challenging as in picking the right operating point? Might be latency
> impacted (and so user perceived UI smoothness) due missing waitboost for
> anything syncobj related. I don't know if anything to measure that
> exists currently though. Assuming it is measurable then the question
> would be is it perceivable.
> > Fwiw, the original motivation of the series was to implement something
> > akin to i915 pageflip boosting without having to abandon the atomic
> > helpers.  (And, I guess it would also let i915 preserve that feature
> > if it switched to atomic helpers.. I'm unsure if there are still other
> > things blocking i915's migration.)
>
> Question for display folks I guess.
>
> >> Then if we fast forward to a world where schedulers perhaps become fully
> >> deadline aware (we even had this for i915 few years back) then the
> >> question will be does equating waits with immediate deadlines still
> >> works. Maybe not too well because we wouldn't have the ability to
> >> distinguish between the "someone is waiting" signal from the otherwise
> >> propagated deadlines.
> >
> > Is there any other way to handle a wait boost than expressing it as an
> > ASAP deadline?
>
> A leading question or just a question? Nothing springs to my mind at the
> moment.

Just a question.  The immediate deadline is the only thing that makes
sense to me, but that could be because I'm looking at it from the
perspective of also trying to handle the case where missing vblank
reduces utilization and provides the wrong signal to gpufreq.. i915
already has a way to handle this internally, but it involves bypassing
the atomic helpers, which isn't a thing I want to encourage other
drivers to do.  And completely doesn't work for situations where the
gpu and display are separate devices.

BR,
-R

> Regards,
>
> Tvrtko

Re: [Intel-gfx] [RFC 6/9] drm/syncobj: Mark syncobj waits as external waiters

2023-02-20 Thread Rob Clark

On Mon, Feb 20, 2023 at 5:19 AM Tvrtko Ursulin
 wrote:
>
>
> On 18/02/2023 19:56, Rob Clark wrote:
> > On Thu, Feb 16, 2023 at 2:59 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> Use the previously added dma-fence tracking of explicit waiters.
> >>
> >> Signed-off-by: Tvrtko Ursulin 
> >> ---
> >>   drivers/gpu/drm/drm_syncobj.c | 6 +++---
> >>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> >> index 0c2be8360525..776b90774a64 100644
> >> --- a/drivers/gpu/drm/drm_syncobj.c
> >> +++ b/drivers/gpu/drm/drm_syncobj.c
> >> @@ -1065,9 +1065,9 @@ static signed long 
> >> drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
> >>  if ((flags & 
> >> DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE) ||
> >>  dma_fence_is_signaled(fence) ||
> >>  (!entries[i].fence_cb.func &&
> >> -dma_fence_add_callback(fence,
> >> -   [i].fence_cb,
> >> -   
> >> syncobj_wait_fence_func))) {
> >> +dma_fence_add_wait_callback(fence,
> >> +
> >> [i].fence_cb,
> >> +
> >> syncobj_wait_fence_func))) {
> >
> > I think this isn't really what you want if count > 1, because you
> > wouldn't be notifying the fence signaler of fence n+1 until the wait
> > on fence n completed
>
> Are you sure? After some staring all I can see is that all callbacks are
> added before the first sleep.

Ahh, yes, you are right

BR,
-R

[Intel-gfx] [PATCH v4 14/14] drm/i915: Add deadline based boost support

2023-02-18 Thread Rob Clark

From: Rob Clark 

Signed-off-by: Rob Clark 
---

This should probably be re-written by someone who knows the i915
request/timeline stuff better, to deal with non-immediate deadlines.
But as-is I think this should be enough to handle the case where
we want syncobj waits to trigger boost.

 drivers/gpu/drm/i915/i915_driver.c  |  2 +-
 drivers/gpu/drm/i915/i915_request.c | 20 
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_driver.c 
b/drivers/gpu/drm/i915/i915_driver.c
index cf1c0970ecb4..bd40b7bcb38a 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -1781,7 +1781,7 @@ static const struct drm_driver i915_drm_driver = {
.driver_features =
DRIVER_GEM |
DRIVER_RENDER | DRIVER_MODESET | DRIVER_ATOMIC | DRIVER_SYNCOBJ |
-   DRIVER_SYNCOBJ_TIMELINE,
+   DRIVER_SYNCOBJ_TIMELINE | DRIVER_SYNCOBJ_DEADLINE,
.release = i915_driver_release,
.open = i915_driver_open,
.lastclose = i915_driver_lastclose,
diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 7503dcb9043b..44491e7e214c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -97,6 +97,25 @@ static bool i915_fence_enable_signaling(struct dma_fence 
*fence)
return i915_request_enable_breadcrumb(to_request(fence));
 }
 
+static void i915_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
+{
+   struct i915_request *rq = to_request(fence);
+
+   if (i915_request_completed(rq))
+   return;
+
+   if (i915_request_started(rq))
+   return;
+
+   /*
+* TODO something more clever for deadlines that are in the
+* future.  I think probably track the nearest deadline in
+* rq->timeline and set timer to trigger boost accordingly?
+*/
+
+   intel_rps_boost(rq);
+}
+
 static signed long i915_fence_wait(struct dma_fence *fence,
   bool interruptible,
   signed long timeout)
@@ -182,6 +201,7 @@ const struct dma_fence_ops i915_fence_ops = {
.signaled = i915_fence_signaled,
.wait = i915_fence_wait,
.release = i915_fence_release,
+   .set_deadline = i915_fence_set_deadline,
 };
 
 static void irq_execute_cb(struct irq_work *wrk)
-- 
2.39.1

[Intel-gfx] [PATCH v4 00/14] dma-fence: Deadline awareness

2023-02-18 Thread Rob Clark

From: Rob Clark 

This series adds deadline awareness to fences, so realtime deadlines
such as vblank can be communicated to the fence signaller for power/
frequency management decisions.

This is partially inspired by a trick i915 does, but implemented
via dma-fence for a couple of reasons:

1) To continue to be able to use the atomic helpers
2) To support cases where display and gpu are different drivers

This iteration adds a dma-fence ioctl to set a deadline (both to
support igt-tests, and compositors which delay decisions about which
client buffer to display), and a sw_sync ioctl to read back the
deadline.  IGT tests utilizing these can be found at:

  https://gitlab.freedesktop.org/robclark/igt-gpu-tools/-/commits/fence-deadline


v1: https://patchwork.freedesktop.org/series/93035/
v2: Move filtering out of later deadlines to fence implementation
to avoid increasing the size of dma_fence
v3: Add support in fence-array and fence-chain; Add some uabi to
support igt tests and userspace compositors.
v4: Rebase, address various comments, and add syncobj deadline
support, and sync_file EPOLLPRI based on experience with perf/
freq issues with clvk compute workloads on i915 (anv)

Rob Clark (14):
  dma-buf/dma-fence: Add deadline awareness
  dma-buf/fence-array: Add fence deadline support
  dma-buf/fence-chain: Add fence deadline support
  dma-buf/dma-resv: Add a way to set fence deadline
  dma-buf/sync_file: Add SET_DEADLINE ioctl
  dma-buf/sync_file: Support (E)POLLPRI
  dma-buf/sw_sync: Add fence deadline support
  drm/scheduler: Add fence deadline support
  drm/syncobj: Add deadline support for syncobj waits
  drm/vblank: Add helper to get next vblank time
  drm/atomic-helper: Set fence deadline for vblank
  drm/msm: Add deadline based boost support
  drm/msm: Add wait-boost support
  drm/i915: Add deadline based boost support

 drivers/dma-buf/dma-fence-array.c   | 11 
 drivers/dma-buf/dma-fence-chain.c   | 13 +
 drivers/dma-buf/dma-fence.c | 20 +++
 drivers/dma-buf/dma-resv.c  | 19 +++
 drivers/dma-buf/sw_sync.c   | 58 +++
 drivers/dma-buf/sync_debug.h|  2 +
 drivers/dma-buf/sync_file.c | 27 +
 drivers/gpu/drm/drm_atomic_helper.c | 36 
 drivers/gpu/drm/drm_ioctl.c |  3 +
 drivers/gpu/drm/drm_syncobj.c   | 59 
 drivers/gpu/drm/drm_vblank.c| 32 +++
 drivers/gpu/drm/i915/i915_driver.c  |  2 +-
 drivers/gpu/drm/i915/i915_request.c | 20 +++
 drivers/gpu/drm/msm/msm_drv.c   | 16 --
 drivers/gpu/drm/msm/msm_fence.c | 74 +
 drivers/gpu/drm/msm/msm_fence.h | 20 +++
 drivers/gpu/drm/msm/msm_gem.c   |  5 ++
 drivers/gpu/drm/scheduler/sched_fence.c | 46 +++
 drivers/gpu/drm/scheduler/sched_main.c  |  2 +-
 include/drm/drm_drv.h   |  6 ++
 include/drm/drm_vblank.h|  1 +
 include/drm/gpu_scheduler.h |  8 +++
 include/linux/dma-fence.h   | 20 +++
 include/linux/dma-resv.h|  2 +
 include/uapi/drm/drm.h  | 16 +-
 include/uapi/drm/msm_drm.h  | 14 -
 include/uapi/linux/sync_file.h  | 22 
 27 files changed, 532 insertions(+), 22 deletions(-)

-- 
2.39.1

Re: [Intel-gfx] [RFC 6/9] drm/syncobj: Mark syncobj waits as external waiters

2023-02-18 Thread Rob Clark

On Thu, Feb 16, 2023 at 2:59 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> Use the previously added dma-fence tracking of explicit waiters.
>
> Signed-off-by: Tvrtko Ursulin 
> ---
>  drivers/gpu/drm/drm_syncobj.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index 0c2be8360525..776b90774a64 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -1065,9 +1065,9 @@ static signed long 
> drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
> if ((flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE) ||
> dma_fence_is_signaled(fence) ||
> (!entries[i].fence_cb.func &&
> -dma_fence_add_callback(fence,
> -   [i].fence_cb,
> -   
> syncobj_wait_fence_func))) {
> +dma_fence_add_wait_callback(fence,
> +[i].fence_cb,
> +
> syncobj_wait_fence_func))) {

I think this isn't really what you want if count > 1, because you
wouldn't be notifying the fence signaler of fence n+1 until the wait
on fence n completed

BR,
-R

> /* The fence has been signaled */
> if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) {
> signaled_count++;
> --
> 2.34.1
>

Re: [Intel-gfx] [RFC 5/9] dma-fence: Track explicit waiters

2023-02-18 Thread Rob Clark

On Thu, Feb 16, 2023 at 3:00 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> Track how many callers are explicity waiting on a fence to signal and
> allow querying that via new dma_fence_wait_count() API.
>
> This provides infrastructure on top of which generic "waitboost" concepts
> can be implemented by individual drivers. Wait-boosting is any reactive
> activity, such as raising the GPU clocks, which happens while there are
> active external waiters.
>
> Signed-off-by: Tvrtko Ursulin 
> ---
>  drivers/dma-buf/dma-fence.c   | 98 +--
>  drivers/gpu/drm/i915/gt/intel_engine_pm.c |  1 -
>  include/linux/dma-fence.h | 15 
>  3 files changed, 87 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index ea4a1f82c9bf..bdba5a8e21b1 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -344,6 +344,25 @@ void __dma_fence_might_wait(void)
>  }
>  #endif
>
> +static void incr_wait_count(struct dma_fence *fence, struct dma_fence_cb *cb)
> +{
> +   lockdep_assert_held(fence->lock);
> +
> +   __set_bit(DMA_FENCE_CB_FLAG_WAITCOUNT_BIT, >flags);
> +   fence->waitcount++;
> +   WARN_ON_ONCE(!fence->waitcount);
> +}
> +
> +static void decr_wait_count(struct dma_fence *fence, struct dma_fence_cb *cb)
> +{
> +   lockdep_assert_held(fence->lock);
> +
> +   if (__test_and_clear_bit(DMA_FENCE_CB_FLAG_WAITCOUNT_BIT, 
> >flags)) {
> +   WARN_ON_ONCE(!fence->waitcount);
> +   fence->waitcount--;
> +   }
> +}
> +
>  void __dma_fence_signal__timestamp(struct dma_fence *fence, ktime_t 
> timestamp)
>  {
> lockdep_assert_held(fence->lock);
> @@ -363,6 +382,7 @@ __dma_fence_signal__notify(struct dma_fence *fence,
> lockdep_assert_held(fence->lock);
>
> list_for_each_entry_safe(cur, tmp, list, node) {
> +   decr_wait_count(fence, cur);
> INIT_LIST_HEAD(>node);
> cur->func(fence, cur);
> }
> @@ -629,11 +649,44 @@ void dma_fence_enable_sw_signaling(struct dma_fence 
> *fence)
> unsigned long flags;
>
> spin_lock_irqsave(fence->lock, flags);
> +   fence->waitcount++;
> +   WARN_ON_ONCE(!fence->waitcount);
> __dma_fence_enable_signaling(fence);
> spin_unlock_irqrestore(fence->lock, flags);
>  }
>  EXPORT_SYMBOL(dma_fence_enable_sw_signaling);
>
> +static int add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
> +   dma_fence_func_t func, bool wait)
> +{
> +   unsigned long flags;
> +   int ret = 0;
> +
> +   __dma_fence_cb_init(cb, func);
> +
> +   if (WARN_ON(!fence || !func))
> +   return -EINVAL;
> +
> +   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags))
> +   return -ENOENT;
> +
> +   spin_lock_irqsave(fence->lock, flags);
> +
> +   if (wait)
> +   incr_wait_count(fence, cb);
> +
> +   if (__dma_fence_enable_signaling(fence)) {
> +   list_add_tail(>node, >cb_list);
> +   } else {
> +   decr_wait_count(fence, cb);
> +   ret = -ENOENT;
> +   }
> +
> +   spin_unlock_irqrestore(fence->lock, flags);
> +
> +   return ret;
> +}
> +
>  /**
>   * dma_fence_add_callback - add a callback to be called when the fence
>   * is signaled
> @@ -659,31 +712,18 @@ EXPORT_SYMBOL(dma_fence_enable_sw_signaling);
>  int dma_fence_add_callback(struct dma_fence *fence, struct dma_fence_cb *cb,
>dma_fence_func_t func)
>  {
> -   unsigned long flags;
> -   int ret = 0;
> -
> -   __dma_fence_cb_init(cb, func);
> -
> -   if (WARN_ON(!fence || !func))
> -   return -EINVAL;
> -
> -   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >flags))
> -   return -ENOENT;
> -
> -   spin_lock_irqsave(fence->lock, flags);
> -
> -   if (__dma_fence_enable_signaling(fence)) {
> -   list_add_tail(>node, >cb_list);
> -   } else {
> -   ret = -ENOENT;
> -   }
> -
> -   spin_unlock_irqrestore(fence->lock, flags);
> -
> -   return ret;
> +   return add_callback(fence, cb, func, false);
>  }
>  EXPORT_SYMBOL(dma_fence_add_callback);
>
> +int dma_fence_add_wait_callback(struct dma_fence *fence,
> +   struct dma_fence_cb *cb,
> +   dma_fence_func_t func)
> +{
> +   return add_callback(fence, cb, func, true);
> +}
> +EXPORT_SYMBOL(dma_fence_add_wait_callback);
> +
>  /**
>   * dma_fence_get_status - returns the status upon completion
>   * @fence: the dma_fence to query
> @@ -736,8 +776,10 @@ dma_fence_remove_callback(struct dma_fence *fence, 
> struct dma_fence_cb *cb)
> spin_lock_irqsave(fence->lock, flags);
>
> ret = !list_empty(>node);
> -   if (ret)
> +   if (ret) {
> +   decr_wait_count(fence, cb);
>

Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

2023-02-17 Thread Rob Clark

On Fri, Feb 17, 2023 at 12:45 PM Rodrigo Vivi  wrote:
>
> On Fri, Feb 17, 2023 at 09:00:49AM -0800, Rob Clark wrote:
> > On Fri, Feb 17, 2023 at 8:03 AM Tvrtko Ursulin
> >  wrote:
> > >
> > >
> > > On 17/02/2023 14:55, Rob Clark wrote:
> > > > On Fri, Feb 17, 2023 at 4:56 AM Tvrtko Ursulin
> > > >  wrote:
> > > >>
> > > >>
> > > >> On 16/02/2023 18:19, Rodrigo Vivi wrote:
> > > >>> On Tue, Feb 14, 2023 at 11:14:00AM -0800, Rob Clark wrote:
> > > >>>> On Fri, Feb 10, 2023 at 5:07 AM Tvrtko Ursulin
> > > >>>>  wrote:
> > > >>>>>
> > > >>>>> From: Tvrtko Ursulin 
> > > >>>>>
> > > >>>>> In i915 we have this concept of "wait boosting" where we give a 
> > > >>>>> priority boost
> > > >>>>> for instance to fences which are actively waited upon from 
> > > >>>>> userspace. This has
> > > >>>>> it's pros and cons and can certainly be discussed at lenght. 
> > > >>>>> However fact is
> > > >>>>> some workloads really like it.
> > > >>>>>
> > > >>>>> Problem is that with the arrival of drm syncobj and a new userspace 
> > > >>>>> waiting
> > > >>>>> entry point it added, the waitboost mechanism was bypassed. Hence I 
> > > >>>>> cooked up
> > > >>>>> this mini series really (really) quickly to see if some discussion 
> > > >>>>> can be had.
> > > >>>>>
> > > >>>>> It adds a concept of "wait count" to dma fence, which is 
> > > >>>>> incremented for every
> > > >>>>> explicit dma_fence_enable_sw_signaling and 
> > > >>>>> dma_fence_add_wait_callback (like
> > > >>>>> dma_fence_add_callback but from explicit/userspace wait paths).
> > > >>>>
> > > >>>> I was thinking about a similar thing, but in the context of dma_fence
> > > >>>> (or rather sync_file) fd poll()ing.  How does the kernel 
> > > >>>> differentiate
> > > >>>> between "housekeeping" poll()ers that don't want to trigger boost but
> > > >>>> simply know when to do cleanup, and waiters who are waiting with some
> > > >>>> urgency.  I think we could use EPOLLPRI for this purpose.
> > > >>>>
> > > >>>> Not sure how that translates to waits via the syncobj.  But I think 
> > > >>>> we
> > > >>>> want to let userspace give some hint about urgent vs housekeeping
> > > >>>> waits.
> > > >>>
> > > >>> Should the hint be on the waits, or should the hints be on the 
> > > >>> executed
> > > >>> context?
> > > >>>
> > > >>> In the end we need some way to quickly ramp-up the frequency to avoid
> > > >>> the execution bubbles.
> > > >>>
> > > >>> waitboost is trying to guess that, but in some cases it guess wrong
> > > >>> and waste power.
> > > >>
> > > >> Do we have a list of workloads which shows who benefits and who loses
> > > >> from the current implementation of waitboost?
> > > >>> btw, this is something that other drivers might need:
> > > >>>
> > > >>> https://gitlab.freedesktop.org/drm/amd/-/issues/1500#note_825883
> > > >>> Cc: Alex Deucher 
> > > >>
> > > >> I have several issues with the context hint if it would directly
> > > >> influence frequency selection in the "more power" direction.
> > > >>
> > > >> First of all, assume a context hint would replace the waitboost. Which
> > > >> applications would need to set it to restore the lost performance and
> > > >> how would they set it?
> > > >>
> > > >> Then I don't even think userspace necessarily knows. Think of a layer
> > > >> like OpenCL. It doesn't really know in advance the profile of
> > > >> submissions vs waits. It depends on the CPU vs GPU speed, so hardware
> > > >> generation, and the actual size of the workload which can be

Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

2023-02-17 Thread Rob Clark

On Fri, Feb 17, 2023 at 8:03 AM Tvrtko Ursulin
 wrote:
>
>
> On 17/02/2023 14:55, Rob Clark wrote:
> > On Fri, Feb 17, 2023 at 4:56 AM Tvrtko Ursulin
> >  wrote:
> >>
> >>
> >> On 16/02/2023 18:19, Rodrigo Vivi wrote:
> >>> On Tue, Feb 14, 2023 at 11:14:00AM -0800, Rob Clark wrote:
> >>>> On Fri, Feb 10, 2023 at 5:07 AM Tvrtko Ursulin
> >>>>  wrote:
> >>>>>
> >>>>> From: Tvrtko Ursulin 
> >>>>>
> >>>>> In i915 we have this concept of "wait boosting" where we give a 
> >>>>> priority boost
> >>>>> for instance to fences which are actively waited upon from userspace. 
> >>>>> This has
> >>>>> it's pros and cons and can certainly be discussed at lenght. However 
> >>>>> fact is
> >>>>> some workloads really like it.
> >>>>>
> >>>>> Problem is that with the arrival of drm syncobj and a new userspace 
> >>>>> waiting
> >>>>> entry point it added, the waitboost mechanism was bypassed. Hence I 
> >>>>> cooked up
> >>>>> this mini series really (really) quickly to see if some discussion can 
> >>>>> be had.
> >>>>>
> >>>>> It adds a concept of "wait count" to dma fence, which is incremented 
> >>>>> for every
> >>>>> explicit dma_fence_enable_sw_signaling and dma_fence_add_wait_callback 
> >>>>> (like
> >>>>> dma_fence_add_callback but from explicit/userspace wait paths).
> >>>>
> >>>> I was thinking about a similar thing, but in the context of dma_fence
> >>>> (or rather sync_file) fd poll()ing.  How does the kernel differentiate
> >>>> between "housekeeping" poll()ers that don't want to trigger boost but
> >>>> simply know when to do cleanup, and waiters who are waiting with some
> >>>> urgency.  I think we could use EPOLLPRI for this purpose.
> >>>>
> >>>> Not sure how that translates to waits via the syncobj.  But I think we
> >>>> want to let userspace give some hint about urgent vs housekeeping
> >>>> waits.
> >>>
> >>> Should the hint be on the waits, or should the hints be on the executed
> >>> context?
> >>>
> >>> In the end we need some way to quickly ramp-up the frequency to avoid
> >>> the execution bubbles.
> >>>
> >>> waitboost is trying to guess that, but in some cases it guess wrong
> >>> and waste power.
> >>
> >> Do we have a list of workloads which shows who benefits and who loses
> >> from the current implementation of waitboost?
> >>> btw, this is something that other drivers might need:
> >>>
> >>> https://gitlab.freedesktop.org/drm/amd/-/issues/1500#note_825883
> >>> Cc: Alex Deucher 
> >>
> >> I have several issues with the context hint if it would directly
> >> influence frequency selection in the "more power" direction.
> >>
> >> First of all, assume a context hint would replace the waitboost. Which
> >> applications would need to set it to restore the lost performance and
> >> how would they set it?
> >>
> >> Then I don't even think userspace necessarily knows. Think of a layer
> >> like OpenCL. It doesn't really know in advance the profile of
> >> submissions vs waits. It depends on the CPU vs GPU speed, so hardware
> >> generation, and the actual size of the workload which can be influenced
> >> by the application (or user) and not the library.
> >>
> >> The approach also lends itself well for the "arms race" where every
> >> application can say "Me me me, I am the most important workload there is!".
> >
> > since there is discussion happening in two places:
> >
> > https://gitlab.freedesktop.org/drm/intel/-/issues/8014#note_1777433
> >
> > What I think you might want is a ctx boost_mask which lets an app or
> > driver disable certain boost signals/classes.  Where fence waits is
> > one class of boost, but hypothetical other signals like touchscreen
> > (or other) input events could be another class of boost.  A compute
> > workload might be interested in fence wait boosts but could care less
> > about input events.
>
> I think it can only be apps which could have any chance knowing whether
&g

Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

2023-02-17 Thread Rob Clark

On Fri, Feb 17, 2023 at 4:56 AM Tvrtko Ursulin
 wrote:
>
>
> On 16/02/2023 18:19, Rodrigo Vivi wrote:
> > On Tue, Feb 14, 2023 at 11:14:00AM -0800, Rob Clark wrote:
> >> On Fri, Feb 10, 2023 at 5:07 AM Tvrtko Ursulin
> >>  wrote:
> >>>
> >>> From: Tvrtko Ursulin 
> >>>
> >>> In i915 we have this concept of "wait boosting" where we give a priority 
> >>> boost
> >>> for instance to fences which are actively waited upon from userspace. 
> >>> This has
> >>> it's pros and cons and can certainly be discussed at lenght. However fact 
> >>> is
> >>> some workloads really like it.
> >>>
> >>> Problem is that with the arrival of drm syncobj and a new userspace 
> >>> waiting
> >>> entry point it added, the waitboost mechanism was bypassed. Hence I 
> >>> cooked up
> >>> this mini series really (really) quickly to see if some discussion can be 
> >>> had.
> >>>
> >>> It adds a concept of "wait count" to dma fence, which is incremented for 
> >>> every
> >>> explicit dma_fence_enable_sw_signaling and dma_fence_add_wait_callback 
> >>> (like
> >>> dma_fence_add_callback but from explicit/userspace wait paths).
> >>
> >> I was thinking about a similar thing, but in the context of dma_fence
> >> (or rather sync_file) fd poll()ing.  How does the kernel differentiate
> >> between "housekeeping" poll()ers that don't want to trigger boost but
> >> simply know when to do cleanup, and waiters who are waiting with some
> >> urgency.  I think we could use EPOLLPRI for this purpose.
> >>
> >> Not sure how that translates to waits via the syncobj.  But I think we
> >> want to let userspace give some hint about urgent vs housekeeping
> >> waits.
> >
> > Should the hint be on the waits, or should the hints be on the executed
> > context?
> >
> > In the end we need some way to quickly ramp-up the frequency to avoid
> > the execution bubbles.
> >
> > waitboost is trying to guess that, but in some cases it guess wrong
> > and waste power.
>
> Do we have a list of workloads which shows who benefits and who loses
> from the current implementation of waitboost?
> > btw, this is something that other drivers might need:
> >
> > https://gitlab.freedesktop.org/drm/amd/-/issues/1500#note_825883
> > Cc: Alex Deucher 
>
> I have several issues with the context hint if it would directly
> influence frequency selection in the "more power" direction.
>
> First of all, assume a context hint would replace the waitboost. Which
> applications would need to set it to restore the lost performance and
> how would they set it?
>
> Then I don't even think userspace necessarily knows. Think of a layer
> like OpenCL. It doesn't really know in advance the profile of
> submissions vs waits. It depends on the CPU vs GPU speed, so hardware
> generation, and the actual size of the workload which can be influenced
> by the application (or user) and not the library.
>
> The approach also lends itself well for the "arms race" where every
> application can say "Me me me, I am the most important workload there is!".

since there is discussion happening in two places:

https://gitlab.freedesktop.org/drm/intel/-/issues/8014#note_1777433

What I think you might want is a ctx boost_mask which lets an app or
driver disable certain boost signals/classes.  Where fence waits is
one class of boost, but hypothetical other signals like touchscreen
(or other) input events could be another class of boost.  A compute
workload might be interested in fence wait boosts but could care less
about input events.

> The last concern is for me shared with the proposal to expose deadlines
> or high priority waits as explicit uapi knobs. Both come under the "what
> application told us it will do" category vs what it actually does. So I
> think it is slightly weaker than basing decisions of waits.
>
> The current waitboost is a bit detached from that problem because when
> we waitboost for flips we _know_ it is an actual framebuffer in the flip
> chain. When we waitboost for waits we also know someone is waiting. We
> are not trusting userspace telling us this will be a buffer in the flip
> chain or that this is a context which will have a certain duty-cycle.
>
> But yes, even if the input is truthful, latter is still only a
> heuristics because nothing says all waits are important. AFAIU it just
> happened to work well in the past.
>
> I do understan

Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

2023-02-16 Thread Rob Clark

On Thu, Feb 16, 2023 at 10:20 AM Rodrigo Vivi  wrote:
>
> On Tue, Feb 14, 2023 at 11:14:00AM -0800, Rob Clark wrote:
> > On Fri, Feb 10, 2023 at 5:07 AM Tvrtko Ursulin
> >  wrote:
> > >
> > > From: Tvrtko Ursulin 
> > >
> > > In i915 we have this concept of "wait boosting" where we give a priority 
> > > boost
> > > for instance to fences which are actively waited upon from userspace. 
> > > This has
> > > it's pros and cons and can certainly be discussed at lenght. However fact 
> > > is
> > > some workloads really like it.
> > >
> > > Problem is that with the arrival of drm syncobj and a new userspace 
> > > waiting
> > > entry point it added, the waitboost mechanism was bypassed. Hence I 
> > > cooked up
> > > this mini series really (really) quickly to see if some discussion can be 
> > > had.
> > >
> > > It adds a concept of "wait count" to dma fence, which is incremented for 
> > > every
> > > explicit dma_fence_enable_sw_signaling and dma_fence_add_wait_callback 
> > > (like
> > > dma_fence_add_callback but from explicit/userspace wait paths).
> >
> > I was thinking about a similar thing, but in the context of dma_fence
> > (or rather sync_file) fd poll()ing.  How does the kernel differentiate
> > between "housekeeping" poll()ers that don't want to trigger boost but
> > simply know when to do cleanup, and waiters who are waiting with some
> > urgency.  I think we could use EPOLLPRI for this purpose.
> >
> > Not sure how that translates to waits via the syncobj.  But I think we
> > want to let userspace give some hint about urgent vs housekeeping
> > waits.
>
> Should the hint be on the waits, or should the hints be on the executed
> context?

I think it should be on the wait, because different waits may be for
different purposes.  Ideally this could be exposed at the app API
level, but I guess first step is to expose it to userspace.

BR,
-R

> In the end we need some way to quickly ramp-up the frequency to avoid
> the execution bubbles.
>
> waitboost is trying to guess that, but in some cases it guess wrong
> and waste power.
>
> btw, this is something that other drivers might need:
>
> https://gitlab.freedesktop.org/drm/amd/-/issues/1500#note_825883
> Cc: Alex Deucher 
>
> >
> > Also, on a related topic: https://lwn.net/Articles/868468/
> >
> > BR,
> > -R
> >
> > > Individual drivers can then inspect this via dma_fence_wait_count() and 
> > > decide
> > > to wait boost the waits on such fences.
> > >
> > > Again, quickly put together and smoke tested only - no guarantees 
> > > whatsoever and
> > > I will rely on interested parties to test and report if it even works or 
> > > how
> > > well.
> > >
> > > v2:
> > >  * Small fixups based on CI feedback:
> > > * Handle decrement correctly for already signalled case while adding 
> > > callback.
> > > * Remove i915 assert which was making sure struct i915_request does 
> > > not grow.
> > >  * Split out the i915 patch into three separate functional changes.
> > >
> > > Tvrtko Ursulin (5):
> > >   dma-fence: Track explicit waiters
> > >   drm/syncobj: Mark syncobj waits as external waiters
> > >   drm/i915: Waitboost external waits
> > >   drm/i915: Mark waits as explicit
> > >   drm/i915: Wait boost requests waited upon by others
> > >
> > >  drivers/dma-buf/dma-fence.c   | 102 --
> > >  drivers/gpu/drm/drm_syncobj.c |   6 +-
> > >  drivers/gpu/drm/i915/gt/intel_engine_pm.c |   1 -
> > >  drivers/gpu/drm/i915/i915_request.c   |  13 ++-
> > >  include/linux/dma-fence.h |  14 +++
> > >  5 files changed, 101 insertions(+), 35 deletions(-)
> > >
> > > --
> > > 2.34.1
> > >

Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

2023-02-16 Thread Rob Clark

On Thu, Feb 16, 2023 at 3:19 AM Tvrtko Ursulin
 wrote:
>
>
> On 14/02/2023 19:14, Rob Clark wrote:
> > On Fri, Feb 10, 2023 at 5:07 AM Tvrtko Ursulin
> >  wrote:
> >>
> >> From: Tvrtko Ursulin 
> >>
> >> In i915 we have this concept of "wait boosting" where we give a priority 
> >> boost
> >> for instance to fences which are actively waited upon from userspace. This 
> >> has
> >> it's pros and cons and can certainly be discussed at lenght. However fact 
> >> is
> >> some workloads really like it.
> >>
> >> Problem is that with the arrival of drm syncobj and a new userspace waiting
> >> entry point it added, the waitboost mechanism was bypassed. Hence I cooked 
> >> up
> >> this mini series really (really) quickly to see if some discussion can be 
> >> had.
> >>
> >> It adds a concept of "wait count" to dma fence, which is incremented for 
> >> every
> >> explicit dma_fence_enable_sw_signaling and dma_fence_add_wait_callback 
> >> (like
> >> dma_fence_add_callback but from explicit/userspace wait paths).
> >
> > I was thinking about a similar thing, but in the context of dma_fence
> > (or rather sync_file) fd poll()ing.  How does the kernel differentiate
> > between "housekeeping" poll()ers that don't want to trigger boost but
> > simply know when to do cleanup, and waiters who are waiting with some
> > urgency.  I think we could use EPOLLPRI for this purpose.
>
> Sounds plausible to allow distinguishing the two.
>
> I wasn't aware one can set POLLPRI in pollfd.events but it appears it could 
> be allowed:
>
> /* Event types that can be polled for.  These bits may be set in `events'
> to indicate the interesting event types; they will appear in `revents'
> to indicate the status of the file descriptor.  */
> #define POLLIN  0x001   /* There is data to read.  */
> #define POLLPRI 0x002   /* There is urgent data to read.  */
> #define POLLOUT 0x004   /* Writing now will not block.  */
>
> > Not sure how that translates to waits via the syncobj.  But I think we
> > want to let userspace give some hint about urgent vs housekeeping
> > waits.
>
> Probably DRM_SYNCOBJ_WAIT_FLAGS_.
>
> Both look easy additions on top of my series. It would be just a matter of 
> dma_fence_add_callback vs dma_fence_add_wait_callback based on flags, as 
> that's how I called the "explicit userspace wait" one.
>
> It would require userspace changes to make use of it but that is probably 
> okay, or even preferable, since it makes the thing less of a heuristic. What 
> I don't know however is how feasible is to wire it up with say OpenCL, OpenGL 
> or Vulkan, to allow application writers distinguish between house keeping vs 
> performance sensitive waits.
>

I think to start with, we consider API level waits as
POLLPRI/DRM_SYNCOBJ_WAIT_PRI until someone types up an extension to
give the app control.  I guess most housekeeping waits will be within
the driver.

(I could see the argument for making "PRI" the default and having a
new flag for non-boosting waits.. but POLLPRI is also some sort of
precedent)

> > Also, on a related topic: https://lwn.net/Articles/868468/
>
> Right, I missed that one.
>
> One thing to mention is that my motivation here wasn't strictly waits 
> relating to frame presentation but clvk workloads which constantly move 
> between the CPU and GPU. Even outside the compute domain, I think this is a 
> workload characteristic where waitboost in general helps. The concept of 
> deadline could still be used I guess, just setting it for some artificially 
> early value, when the actual time does not exist. But scanning that 
> discussion seems the proposal got bogged down in interactions between mode 
> setting and stuff?
>

Yeah, it isn't _exactly_ the same thing but it is the same class of
problem where GPU stalling on something else sends the freq in the
wrong direction.  Probably we could consider wait-boosting as simply
an immediate deadline to unify the two things.

BR,
-R


> Regards,
>
> Tvrtko

Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

2023-02-14 Thread Rob Clark

On Tue, Feb 14, 2023 at 11:14 AM Rob Clark  wrote:
>
> On Fri, Feb 10, 2023 at 5:07 AM Tvrtko Ursulin
>  wrote:
> >
> > From: Tvrtko Ursulin 
> >
> > In i915 we have this concept of "wait boosting" where we give a priority 
> > boost
> > for instance to fences which are actively waited upon from userspace. This 
> > has
> > it's pros and cons and can certainly be discussed at lenght. However fact is
> > some workloads really like it.
> >
> > Problem is that with the arrival of drm syncobj and a new userspace waiting
> > entry point it added, the waitboost mechanism was bypassed. Hence I cooked 
> > up
> > this mini series really (really) quickly to see if some discussion can be 
> > had.
> >
> > It adds a concept of "wait count" to dma fence, which is incremented for 
> > every
> > explicit dma_fence_enable_sw_signaling and dma_fence_add_wait_callback (like
> > dma_fence_add_callback but from explicit/userspace wait paths).
>
> I was thinking about a similar thing, but in the context of dma_fence
> (or rather sync_file) fd poll()ing.  How does the kernel differentiate
> between "housekeeping" poll()ers that don't want to trigger boost but
> simply know when to do cleanup, and waiters who are waiting with some
> urgency.  I think we could use EPOLLPRI for this purpose.
>
> Not sure how that translates to waits via the syncobj.  But I think we
> want to let userspace give some hint about urgent vs housekeeping
> waits.

So probably the syncobj equiv of this would be to add something along
the lines of DRM_SYNCOBJ_WAIT_FLAGS_WAIT_PRI

BR,
-R

> Also, on a related topic: https://lwn.net/Articles/868468/
>
> BR,
> -R
>
> > Individual drivers can then inspect this via dma_fence_wait_count() and 
> > decide
> > to wait boost the waits on such fences.
> >
> > Again, quickly put together and smoke tested only - no guarantees 
> > whatsoever and
> > I will rely on interested parties to test and report if it even works or how
> > well.
> >
> > v2:
> >  * Small fixups based on CI feedback:
> > * Handle decrement correctly for already signalled case while adding 
> > callback.
> > * Remove i915 assert which was making sure struct i915_request does not 
> > grow.
> >  * Split out the i915 patch into three separate functional changes.
> >
> > Tvrtko Ursulin (5):
> >   dma-fence: Track explicit waiters
> >   drm/syncobj: Mark syncobj waits as external waiters
> >   drm/i915: Waitboost external waits
> >   drm/i915: Mark waits as explicit
> >   drm/i915: Wait boost requests waited upon by others
> >
> >  drivers/dma-buf/dma-fence.c   | 102 --
> >  drivers/gpu/drm/drm_syncobj.c |   6 +-
> >  drivers/gpu/drm/i915/gt/intel_engine_pm.c |   1 -
> >  drivers/gpu/drm/i915/i915_request.c   |  13 ++-
> >  include/linux/dma-fence.h |  14 +++
> >  5 files changed, 101 insertions(+), 35 deletions(-)
> >
> > --
> > 2.34.1
> >

Re: [Intel-gfx] [RFC v2 0/5] Waitboost drm syncobj waits

2023-02-14 Thread Rob Clark

On Fri, Feb 10, 2023 at 5:07 AM Tvrtko Ursulin
 wrote:
>
> From: Tvrtko Ursulin 
>
> In i915 we have this concept of "wait boosting" where we give a priority boost
> for instance to fences which are actively waited upon from userspace. This has
> it's pros and cons and can certainly be discussed at lenght. However fact is
> some workloads really like it.
>
> Problem is that with the arrival of drm syncobj and a new userspace waiting
> entry point it added, the waitboost mechanism was bypassed. Hence I cooked up
> this mini series really (really) quickly to see if some discussion can be had.
>
> It adds a concept of "wait count" to dma fence, which is incremented for every
> explicit dma_fence_enable_sw_signaling and dma_fence_add_wait_callback (like
> dma_fence_add_callback but from explicit/userspace wait paths).

I was thinking about a similar thing, but in the context of dma_fence
(or rather sync_file) fd poll()ing.  How does the kernel differentiate
between "housekeeping" poll()ers that don't want to trigger boost but
simply know when to do cleanup, and waiters who are waiting with some
urgency.  I think we could use EPOLLPRI for this purpose.

Not sure how that translates to waits via the syncobj.  But I think we
want to let userspace give some hint about urgent vs housekeeping
waits.

Also, on a related topic: https://lwn.net/Articles/868468/

BR,
-R

> Individual drivers can then inspect this via dma_fence_wait_count() and decide
> to wait boost the waits on such fences.
>
> Again, quickly put together and smoke tested only - no guarantees whatsoever 
> and
> I will rely on interested parties to test and report if it even works or how
> well.
>
> v2:
>  * Small fixups based on CI feedback:
> * Handle decrement correctly for already signalled case while adding 
> callback.
> * Remove i915 assert which was making sure struct i915_request does not 
> grow.
>  * Split out the i915 patch into three separate functional changes.
>
> Tvrtko Ursulin (5):
>   dma-fence: Track explicit waiters
>   drm/syncobj: Mark syncobj waits as external waiters
>   drm/i915: Waitboost external waits
>   drm/i915: Mark waits as explicit
>   drm/i915: Wait boost requests waited upon by others
>
>  drivers/dma-buf/dma-fence.c   | 102 --
>  drivers/gpu/drm/drm_syncobj.c |   6 +-
>  drivers/gpu/drm/i915/gt/intel_engine_pm.c |   1 -
>  drivers/gpu/drm/i915/i915_request.c   |  13 ++-
>  include/linux/dma-fence.h |  14 +++
>  5 files changed, 101 insertions(+), 35 deletions(-)
>
> --
> 2.34.1
>

Re: [Intel-gfx] [PATCH] drm/i915: Move fd_install after last use of fence

2023-02-03 Thread Rob Clark

On Fri, Feb 3, 2023 at 8:49 AM Rob Clark  wrote:
>
> From: Rob Clark 
>
> Because eb_composite_fence_create() drops the fence_array reference
> after creation of the sync_file, only the sync_file holds a ref to the
> fence.  But fd_install() makes that reference visable to userspace, so
> it must be the last thing we do with the fence.
>

Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")

> Signed-off-by: Rob Clark 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index f266b68cf012..0f2e056c02dd 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -3476,38 +3476,38 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>
>  err_request:
> eb_requests_get();
> err = eb_requests_add(, err);
>
> if (eb.fences)
> signal_fence_array(, eb.composite_fence ?
>eb.composite_fence :
>[0]->fence);
>
> +   if (unlikely(eb.gem_context->syncobj)) {
> +   drm_syncobj_replace_fence(eb.gem_context->syncobj,
> + eb.composite_fence ?
> + eb.composite_fence :
> + [0]->fence);
> +   }
> +
> if (out_fence) {
> if (err == 0) {
> fd_install(out_fence_fd, out_fence->file);
> args->rsvd2 &= GENMASK_ULL(31, 0); /* keep in-fence */
> args->rsvd2 |= (u64)out_fence_fd << 32;
> out_fence_fd = -1;
> } else {
> fput(out_fence->file);
> }
> }
>
> -   if (unlikely(eb.gem_context->syncobj)) {
> -   drm_syncobj_replace_fence(eb.gem_context->syncobj,
> - eb.composite_fence ?
> - eb.composite_fence :
> - [0]->fence);
> -   }
> -
> if (!out_fence && eb.composite_fence)
> dma_fence_put(eb.composite_fence);
>
> eb_requests_put();
>
>  err_vma:
> eb_release_vmas(, true);
> WARN_ON(err == -EDEADLK);
> i915_gem_ww_ctx_fini();
>
> --
> 2.38.1
>

[Intel-gfx] [PATCH] drm/i915: Move fd_install after last use of fence

2023-02-03 Thread Rob Clark

From: Rob Clark 

Because eb_composite_fence_create() drops the fence_array reference
after creation of the sync_file, only the sync_file holds a ref to the
fence.  But fd_install() makes that reference visable to userspace, so
it must be the last thing we do with the fence.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index f266b68cf012..0f2e056c02dd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -3476,38 +3476,38 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 
 err_request:
eb_requests_get();
err = eb_requests_add(, err);
 
if (eb.fences)
signal_fence_array(, eb.composite_fence ?
   eb.composite_fence :
   [0]->fence);
 
+   if (unlikely(eb.gem_context->syncobj)) {
+   drm_syncobj_replace_fence(eb.gem_context->syncobj,
+ eb.composite_fence ?
+ eb.composite_fence :
+ [0]->fence);
+   }
+
if (out_fence) {
if (err == 0) {
fd_install(out_fence_fd, out_fence->file);
args->rsvd2 &= GENMASK_ULL(31, 0); /* keep in-fence */
args->rsvd2 |= (u64)out_fence_fd << 32;
out_fence_fd = -1;
} else {
fput(out_fence->file);
}
}
 
-   if (unlikely(eb.gem_context->syncobj)) {
-   drm_syncobj_replace_fence(eb.gem_context->syncobj,
- eb.composite_fence ?
- eb.composite_fence :
- [0]->fence);
-   }
-
if (!out_fence && eb.composite_fence)
dma_fence_put(eb.composite_fence);
 
eb_requests_put();
 
 err_vma:
eb_release_vmas(, true);
WARN_ON(err == -EDEADLK);
i915_gem_ww_ctx_fini();
 
-- 
2.38.1

[Intel-gfx] [PATCH] drm/i915: Fix potential bit_17 double-free

2023-01-27 Thread Rob Clark

From: Rob Clark 

A userspace with multiple threads racing I915_GEM_SET_TILING to set the
tiling to I915_TILING_NONE could trigger a double free of the bit_17
bitmask.  (Or conversely leak memory on the transition to tiled.)  Move
allocation/free'ing of the bitmask within the section protected by the
obj lock.

Fixes: e9b73c67390a ("drm/i915: Reduce memory pressure during shrinker by 
preallocating swizzle pages")
Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/gem/i915_gem_tiling.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c 
b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
index fd42b89b7162..bc21b1c2350a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c
@@ -298,36 +298,37 @@ i915_gem_object_set_tiling(struct drm_i915_gem_object 
*obj,
vma->fence_alignment =
i915_gem_fence_alignment(i915,
 vma->size, tiling, stride);
 
if (vma->fence)
vma->fence->dirty = true;
}
spin_unlock(>vma.lock);
 
obj->tiling_and_stride = tiling | stride;
-   i915_gem_object_unlock(obj);
-
-   /* Force the fence to be reacquired for GTT access */
-   i915_gem_object_release_mmap_gtt(obj);
 
/* Try to preallocate memory required to save swizzling on put-pages */
if (i915_gem_object_needs_bit17_swizzle(obj)) {
if (!obj->bit_17) {
obj->bit_17 = bitmap_zalloc(obj->base.size >> 
PAGE_SHIFT,
GFP_KERNEL);
}
} else {
bitmap_free(obj->bit_17);
obj->bit_17 = NULL;
}
 
+   i915_gem_object_unlock(obj);
+
+   /* Force the fence to be reacquired for GTT access */
+   i915_gem_object_release_mmap_gtt(obj);
+
return 0;
 }
 
 /**
  * i915_gem_set_tiling_ioctl - IOCTL handler to set tiling mode
  * @dev: DRM device
  * @data: data pointer for the ioctl
  * @file: DRM file for the ioctl call
  *
  * Sets the tiling mode of an object, returning the required swizzling of
-- 
2.38.1

[Intel-gfx] [PATCH] drm/i915: Avoid potential vm use-after-free

2023-01-19 Thread Rob Clark

From: Rob Clark 

Adding the vm to the vm_xa table makes it visible to userspace, which
could try to race with us to close the vm.  So we need to take our extra
reference before putting it in the table.

Signed-off-by: Rob Clark 
---
Note, you could list commit e1a7ab4fca0c ("drm/i915: Remove the vm open
count") as the "fixed" commit, but really the issue seems to go back
much further (with the fix needing some backporting in the process).

 drivers/gpu/drm/i915/gem/i915_gem_context.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 6250de9b9196..e4b78ab4773b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1861,11 +1861,19 @@ static int get_ppgtt(struct drm_i915_file_private 
*file_priv,
vm = ctx->vm;
GEM_BUG_ON(!vm);
 
+   /*
+* Get a reference for the allocated handle.  Once the handle is
+* visible in the vm_xa table, userspace could try to close it
+* from under our feet, so we need to hold the extra reference
+* first.
+*/
+   i915_vm_get(vm);
+
err = xa_alloc(_priv->vm_xa, , vm, xa_limit_32b, GFP_KERNEL);
-   if (err)
+   if (err) {
+   i915_vm_put(vm);
return err;
-
-   i915_vm_get(vm);
+   }
 
GEM_BUG_ON(id == 0); /* reserved for invalid/unassigned ppgtt */
args->value = id;
-- 
2.38.1

Re: [Intel-gfx] [PATCH] drm/i915: Fix potential context UAFs

2023-01-04 Thread Rob Clark

On Wed, Jan 4, 2023 at 1:34 AM Tvrtko Ursulin
 wrote:
>
>
> On 03/01/2023 23:49, Rob Clark wrote:
> > From: Rob Clark 
> >
> > gem_context_register() makes the context visible to userspace, and which
> > point a separate thread can trigger the I915_GEM_CONTEXT_DESTROY ioctl.
> > So we need to ensure that nothing uses the ctx ptr after this.  And we
> > need to ensure that adding the ctx to the xarray is the *last* thing
> > that gem_context_register() does with the ctx pointer.
>
> Any backtraces from oopses or notes on how it was found to record in the 
> commit message?

It was a UAF bug that was reported to us

https://bugs.chromium.org/p/chromium/issues/detail?id=1401594 (but I
guess security bugs are not going to be visible)

>
> > Signed-off-by: Rob Clark 
>
> Fixes: a4c1cdd34e2c ("drm/i915/gem: Delay context creation (v3)")
> References: 3aa9945a528e ("drm/i915: Separate GEM context construction and 
> registration to userspace")
> Cc:  # v5.15+
>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c | 24 +++--
> >   1 file changed, 18 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 7f2831efc798..6250de9b9196 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -1688,6 +1688,10 @@ void i915_gem_init__contexts(struct drm_i915_private 
> > *i915)
> >   init_contexts(>gem.contexts);
> >   }
> >
> > +/*
> > + * Note that this implicitly consumes the ctx reference, by placing
> > + * the ctx in the context_xa.
> > + */
> >   static void gem_context_register(struct i915_gem_context *ctx,
> >struct drm_i915_file_private *fpriv,
> >u32 id)
> > @@ -1703,10 +1707,6 @@ static void gem_context_register(struct 
> > i915_gem_context *ctx,
> >   snprintf(ctx->name, sizeof(ctx->name), "%s[%d]",
> >current->comm, pid_nr(ctx->pid));
> >
> > - /* And finally expose ourselves to userspace via the idr */
> > - old = xa_store(>context_xa, id, ctx, GFP_KERNEL);
> > - WARN_ON(old);
> > -
> >   spin_lock(>client->ctx_lock);
> >   list_add_tail_rcu(>client_link, >client->ctx_list);
> >   spin_unlock(>client->ctx_lock);
> > @@ -1714,6 +1714,10 @@ static void gem_context_register(struct 
> > i915_gem_context *ctx,
> >   spin_lock(>gem.contexts.lock);
> >   list_add_tail(>link, >gem.contexts.list);
> >   spin_unlock(>gem.contexts.lock);
> > +
> > + /* And finally expose ourselves to userspace via the idr */
> > + old = xa_store(>context_xa, id, ctx, GFP_KERNEL);
> > + WARN_ON(old);
>
> Have you seen that this hunk is needed or just moving it for a good measure? 
> To be clear, it is probably best to move it even if the current placement 
> cannot cause any problems, I am just double-checking if you had any concrete 
> observations here while mulling over easier stable backports if we would omit 
> it.
>

This was actually the originally reported issue, the
finalize_create_context_locked() part was something I found when the
original report prompted me to audit gem_context_register() call
paths.


> >   }
> >
> >   int i915_gem_context_open(struct drm_i915_private *i915,
> > @@ -2199,14 +2203,22 @@ finalize_create_context_locked(struct 
> > drm_i915_file_private *file_priv,
> >   if (IS_ERR(ctx))
> >   return ctx;
> >
> > + /*
> > +  * One for the xarray and one for the caller.  We need to grab
> > +  * the reference *prior* to making the ctx visble to userspace
> > +  * in gem_context_register(), as at any point after that
> > +  * userspace can try to race us with another thread destroying
> > +  * the context under our feet.
> > +  */
> > + i915_gem_context_get(ctx);
> > +
> >   gem_context_register(ctx, file_priv, id);
> >
> >   old = xa_erase(_priv->proto_context_xa, id);
> >   GEM_BUG_ON(old != pc);
> >   proto_context_close(file_priv->dev_priv, pc);
> >
> > - /* One for the xarray and one for the caller */
> > - return i915_gem_context_get(ctx);
> > + return ctx;
>
> Otherwise userspace can look up a context which hasn't had it's reference 
> count increased yep. I can add the Fixes: and Stable: tags while merging if 
> no complaints.
>
> Reviewed-by: Tvrtko Ursulin 

Thanks

BR,
-R

>
> Regards,
>
> Tvrtko
>
> >   }
> >
> >   struct i915_gem_context *

[Intel-gfx] [PATCH] drm/i915: Fix potential context UAFs

2023-01-03 Thread Rob Clark

From: Rob Clark 

gem_context_register() makes the context visible to userspace, and which
point a separate thread can trigger the I915_GEM_CONTEXT_DESTROY ioctl.
So we need to ensure that nothing uses the ctx ptr after this.  And we
need to ensure that adding the ctx to the xarray is the *last* thing
that gem_context_register() does with the ctx pointer.

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 24 +++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 7f2831efc798..6250de9b9196 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1688,6 +1688,10 @@ void i915_gem_init__contexts(struct drm_i915_private 
*i915)
init_contexts(>gem.contexts);
 }
 
+/*
+ * Note that this implicitly consumes the ctx reference, by placing
+ * the ctx in the context_xa.
+ */
 static void gem_context_register(struct i915_gem_context *ctx,
 struct drm_i915_file_private *fpriv,
 u32 id)
@@ -1703,10 +1707,6 @@ static void gem_context_register(struct i915_gem_context 
*ctx,
snprintf(ctx->name, sizeof(ctx->name), "%s[%d]",
 current->comm, pid_nr(ctx->pid));
 
-   /* And finally expose ourselves to userspace via the idr */
-   old = xa_store(>context_xa, id, ctx, GFP_KERNEL);
-   WARN_ON(old);
-
spin_lock(>client->ctx_lock);
list_add_tail_rcu(>client_link, >client->ctx_list);
spin_unlock(>client->ctx_lock);
@@ -1714,6 +1714,10 @@ static void gem_context_register(struct i915_gem_context 
*ctx,
spin_lock(>gem.contexts.lock);
list_add_tail(>link, >gem.contexts.list);
spin_unlock(>gem.contexts.lock);
+
+   /* And finally expose ourselves to userspace via the idr */
+   old = xa_store(>context_xa, id, ctx, GFP_KERNEL);
+   WARN_ON(old);
 }
 
 int i915_gem_context_open(struct drm_i915_private *i915,
@@ -2199,14 +2203,22 @@ finalize_create_context_locked(struct 
drm_i915_file_private *file_priv,
if (IS_ERR(ctx))
return ctx;
 
+   /*
+* One for the xarray and one for the caller.  We need to grab
+* the reference *prior* to making the ctx visble to userspace
+* in gem_context_register(), as at any point after that
+* userspace can try to race us with another thread destroying
+* the context under our feet.
+*/
+   i915_gem_context_get(ctx);
+
gem_context_register(ctx, file_priv, id);
 
old = xa_erase(_priv->proto_context_xa, id);
GEM_BUG_ON(old != pc);
proto_context_close(file_priv->dev_priv, pc);
 
-   /* One for the xarray and one for the caller */
-   return i915_gem_context_get(ctx);
+   return ctx;
 }
 
 struct i915_gem_context *
-- 
2.38.1

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2022-12-23 Thread Rob Clark

On Thu, Dec 22, 2022 at 2:29 PM Matthew Brost  wrote:
>
> In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
> mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
> seems a bit odd but let us explain the reasoning below.
>
> 1. In XE the submission order from multiple drm_sched_entity is not
> guaranteed to be the same completion even if targeting the same hardware
> engine. This is because in XE we have a firmware scheduler, the GuC,
> which allowed to reorder, timeslice, and preempt submissions. If a using
> shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
> apart as the TDR expects submission order == completion order. Using a
> dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
>
> 2. In XE submissions are done via programming a ring buffer (circular
> buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
> limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
> control on the ring for free.
>
> A problem with this design is currently a drm_gpu_scheduler uses a
> kthread for submission / job cleanup. This doesn't scale if a large
> number of drm_gpu_scheduler are used. To work around the scaling issue,
> use a worker rather than kthread for submission / job cleanup.

You might want to enable CONFIG_DRM_MSM in your kconfig, I think you
missed a part

BR,
-R

> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  14 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  12 +-
>  drivers/gpu/drm/scheduler/sched_main.c  | 124 
>  include/drm/gpu_scheduler.h |  13 +-
>  4 files changed, 93 insertions(+), 70 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index f60753f97ac5..9c2a10aeb0b3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -1489,9 +1489,9 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
> *m, void *unused)
> for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
> struct amdgpu_ring *ring = adev->rings[i];
>
> -   if (!ring || !ring->sched.thread)
> +   if (!ring || !ring->sched.ready)
> continue;
> -   kthread_park(ring->sched.thread);
> +   drm_sched_run_wq_stop(>sched);
> }
>
> seq_printf(m, "run ib test:\n");
> @@ -1505,9 +1505,9 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file 
> *m, void *unused)
> for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
> struct amdgpu_ring *ring = adev->rings[i];
>
> -   if (!ring || !ring->sched.thread)
> +   if (!ring || !ring->sched.ready)
> continue;
> -   kthread_unpark(ring->sched.thread);
> +   drm_sched_run_wq_start(>sched);
> }
>
> up_write(>reset_domain->sem);
> @@ -1727,7 +1727,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 
> val)
>
> ring = adev->rings[val];
>
> -   if (!ring || !ring->funcs->preempt_ib || !ring->sched.thread)
> +   if (!ring || !ring->funcs->preempt_ib || !ring->sched.ready)
> return -EINVAL;
>
> /* the last preemption failed */
> @@ -1745,7 +1745,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 
> val)
> goto pro_end;
>
> /* stop the scheduler */
> -   kthread_park(ring->sched.thread);
> +   drm_sched_run_wq_stop(>sched);
>
> /* preempt the IB */
> r = amdgpu_ring_preempt_ib(ring);
> @@ -1779,7 +1779,7 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 
> val)
>
>  failure:
> /* restart the scheduler */
> -   kthread_unpark(ring->sched.thread);
> +   drm_sched_run_wq_start(>sched);
>
> up_read(>reset_domain->sem);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 076ae400d099..9552929ccf87 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4577,7 +4577,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device 
> *adev)
> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> struct amdgpu_ring *ring = adev->rings[i];
>
> -   if (!ring || !ring->sched.thread)
> +   if (!ring || !ring->sched.ready)
> continue;
>
> spin_lock(>sched.job_list_lock);
> @@ -4708,7 +4708,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
> *adev,
> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> struct amdgpu_ring *ring = adev->rings[i];
>
> -   if (!ring || !ring->sched.thread)
> +   if (!ring || !ring->sched.ready)
> continue;
>
> /*clear job fence from fence drv to avoid

[Intel-gfx] [PATCH] drm/i915: Remove __dma_fence_is_chain()

2022-06-28 Thread Rob Clark

From: Rob Clark 

drive-by cleanup

Signed-off-by: Rob Clark 
---
 drivers/gpu/drm/i915/gem/i915_gem_wait.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 319936f91ac5..667841780514 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -73,11 +73,6 @@ static void fence_set_priority(struct dma_fence *fence,
rcu_read_unlock();
 }
 
-static inline bool __dma_fence_is_chain(const struct dma_fence *fence)
-{
-   return fence->ops == _fence_chain_ops;
-}
-
 void i915_gem_fence_wait_priority(struct dma_fence *fence,
  const struct i915_sched_attr *attr)
 {
@@ -93,7 +88,7 @@ void i915_gem_fence_wait_priority(struct dma_fence *fence,
 
for (i = 0; i < array->num_fences; i++)
fence_set_priority(array->fences[i], attr);
-   } else if (__dma_fence_is_chain(fence)) {
+   } else if (dma_fence_is_chain(fence)) {
struct dma_fence *iter;
 
/* The chain is ordered; if we boost the last, we boost all */
-- 
2.36.1

Re: [Intel-gfx] [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers

2022-06-28 Thread Rob Clark

On Tue, Jun 28, 2022 at 5:51 AM Dmitry Osipenko
 wrote:
>
> On 6/28/22 15:31, Robin Murphy wrote:
> > ->8-
> > [   68.295951] ==
> > [   68.295956] WARNING: possible circular locking dependency detected
> > [   68.295963] 5.19.0-rc3+ #400 Not tainted
> > [   68.295972] --
> > [   68.295977] cc1/295 is trying to acquire lock:
> > [   68.295986] 08d7f1a0
> > (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_free+0x7c/0x198
> > [   68.296036]
> > [   68.296036] but task is already holding lock:
> > [   68.296041] 8c14b820 (fs_reclaim){+.+.}-{0:0}, at:
> > __alloc_pages_slowpath.constprop.0+0x4d8/0x1470
> > [   68.296080]
> > [   68.296080] which lock already depends on the new lock.
> > [   68.296080]
> > [   68.296085]
> > [   68.296085] the existing dependency chain (in reverse order) is:
> > [   68.296090]
> > [   68.296090] -> #1 (fs_reclaim){+.+.}-{0:0}:
> > [   68.296111]fs_reclaim_acquire+0xb8/0x150
> > [   68.296130]dma_resv_lockdep+0x298/0x3fc
> > [   68.296148]do_one_initcall+0xe4/0x5f8
> > [   68.296163]kernel_init_freeable+0x414/0x49c
> > [   68.296180]kernel_init+0x2c/0x148
> > [   68.296195]ret_from_fork+0x10/0x20
> > [   68.296207]
> > [   68.296207] -> #0 (reservation_ww_class_mutex){+.+.}-{3:3}:
> > [   68.296229]__lock_acquire+0x1724/0x2398
> > [   68.296246]lock_acquire+0x218/0x5b0
> > [   68.296260]__ww_mutex_lock.constprop.0+0x158/0x2378
> > [   68.296277]ww_mutex_lock+0x7c/0x4d8
> > [   68.296291]drm_gem_shmem_free+0x7c/0x198
> > [   68.296304]panfrost_gem_free_object+0x118/0x138
> > [   68.296318]drm_gem_object_free+0x40/0x68
> > [   68.296334]drm_gem_shmem_shrinker_run_objects_scan+0x42c/0x5b8
> > [   68.296352]drm_gem_shmem_shrinker_scan_objects+0xa4/0x170
> > [   68.296368]do_shrink_slab+0x220/0x808
> > [   68.296381]shrink_slab+0x11c/0x408
> > [   68.296392]shrink_node+0x6ac/0xb90
> > [   68.296403]do_try_to_free_pages+0x1dc/0x8d0
> > [   68.296416]try_to_free_pages+0x1ec/0x5b0
> > [   68.296429]__alloc_pages_slowpath.constprop.0+0x528/0x1470
> > [   68.296444]__alloc_pages+0x4e0/0x5b8
> > [   68.296455]__folio_alloc+0x24/0x60
> > [   68.296467]vma_alloc_folio+0xb8/0x2f8
> > [   68.296483]alloc_zeroed_user_highpage_movable+0x58/0x68
> > [   68.296498]__handle_mm_fault+0x918/0x12a8
> > [   68.296513]handle_mm_fault+0x130/0x300
> > [   68.296527]do_page_fault+0x1d0/0x568
> > [   68.296539]do_translation_fault+0xa0/0xb8
> > [   68.296551]do_mem_abort+0x68/0xf8
> > [   68.296562]el0_da+0x74/0x100
> > [   68.296572]el0t_64_sync_handler+0x68/0xc0
> > [   68.296585]el0t_64_sync+0x18c/0x190
> > [   68.296596]
> > [   68.296596] other info that might help us debug this:
> > [   68.296596]
> > [   68.296601]  Possible unsafe locking scenario:
> > [   68.296601]
> > [   68.296604]CPU0CPU1
> > [   68.296608]
> > [   68.296612]   lock(fs_reclaim);
> > [   68.296622] lock(reservation_ww_class_mutex);
> > [   68.296633]lock(fs_reclaim);
> > [   68.296644]   lock(reservation_ww_class_mutex);
> > [   68.296654]
> > [   68.296654]  *** DEADLOCK ***
>
> This splat could be ignored for now. I'm aware about it, although
> haven't looked closely at how to fix it since it's a kind of a lockdep
> misreporting.

The lockdep splat could be fixed with something similar to what I've
done in msm, ie. basically just not acquire the lock in the finalizer:

https://patchwork.freedesktop.org/patch/489364/

There is one gotcha to watch for, as danvet pointed out
(scan_objects() could still see the obj in the LRU before the
finalizer removes it), but if scan_objects() does the
kref_get_unless_zero() trick, it is safe.

BR,
-R

Re: [Intel-gfx] [PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

2022-06-20 Thread Rob Clark

()

On Thu, May 26, 2022 at 4:55 PM Dmitry Osipenko
 wrote:
>
> Introduce a common DRM SHMEM shrinker framework that allows to reduce
> code duplication among DRM drivers by replacing theirs custom shrinker
> implementations with the generic shrinker.
>
> In order to start using DRM SHMEM shrinker drivers should:
>
> 1. Implement new evict() shmem object callback.
> 2. Register shrinker using drm_gem_shmem_shrinker_register(drm_device).
> 3. Use drm_gem_shmem_set_purgeable(shmem) and alike API functions to
>activate shrinking of shmem GEMs.
>
> This patch is based on a ideas borrowed from Rob's Clark MSM shrinker,
> Thomas' Zimmermann variant of SHMEM shrinker and Intel's i915 shrinker.
>
> Signed-off-by: Daniel Almeida 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/gpu/drm/drm_gem_shmem_helper.c| 540 --
>  .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |   9 +-
>  drivers/gpu/drm/virtio/virtgpu_drv.h  |   3 +
>  include/drm/drm_device.h  |   4 +
>  include/drm/drm_gem_shmem_helper.h|  87 ++-
>  5 files changed, 594 insertions(+), 49 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
> b/drivers/gpu/drm/drm_gem_shmem_helper.c
> index 555fe212bd98..4cd0b5913492 100644
> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> @@ -126,6 +126,42 @@ struct drm_gem_shmem_object *drm_gem_shmem_create(struct 
> drm_device *dev, size_t
>  }
>  EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
>
> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
> +{
> +   return (shmem->madv >= 0) && shmem->evict &&
> +   shmem->eviction_enabled && shmem->pages_use_count &&
> +   !shmem->pages_pin_count && !shmem->base.dma_buf &&
> +   !shmem->base.import_attach && shmem->sgt && !shmem->evicted;
> +}
> +
> +static void
> +drm_gem_shmem_update_pages_state(struct drm_gem_shmem_object *shmem)
> +{
> +   struct drm_gem_object *obj = >base;
> +   struct drm_gem_shmem_shrinker *gem_shrinker = 
> obj->dev->shmem_shrinker;
> +
> +   dma_resv_assert_held(shmem->base.resv);
> +
> +   if (!gem_shrinker || obj->import_attach)
> +   return;
> +
> +   mutex_lock(_shrinker->lock);
> +
> +   if (drm_gem_shmem_is_evictable(shmem) ||
> +   drm_gem_shmem_is_purgeable(shmem))
> +   list_move_tail(>madv_list, 
> _shrinker->lru_evictable);
> +   else if (shmem->madv < 0)
> +   list_del_init(>madv_list);
> +   else if (shmem->evicted)
> +   list_move_tail(>madv_list, _shrinker->lru_evicted);
> +   else if (!shmem->pages)
> +   list_del_init(>madv_list);
> +   else
> +   list_move_tail(>madv_list, _shrinker->lru_pinned);
> +
> +   mutex_unlock(_shrinker->lock);
> +}
> +
>  /**
>   * drm_gem_shmem_free - Free resources associated with a shmem GEM object
>   * @shmem: shmem GEM object to free
> @@ -142,6 +178,9 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object 
> *shmem)
> } else {
> dma_resv_lock(shmem->base.resv, NULL);
>
> +   /* take out shmem GEM object from the memory shrinker */
> +   drm_gem_shmem_madvise(shmem, -1);
> +
> WARN_ON(shmem->vmap_use_count);
>
> if (shmem->sgt) {
> @@ -150,7 +189,7 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object 
> *shmem)
> sg_free_table(shmem->sgt);
> kfree(shmem->sgt);
> }
> -   if (shmem->pages)
> +   if (shmem->pages_use_count)
> drm_gem_shmem_put_pages(shmem);
>
> WARN_ON(shmem->pages_use_count);
> @@ -163,18 +202,82 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object 
> *shmem)
>  }
>  EXPORT_SYMBOL_GPL(drm_gem_shmem_free);
>
> -static int drm_gem_shmem_get_pages(struct drm_gem_shmem_object *shmem)
> +/**
> + * drm_gem_shmem_set_evictable() - Make GEM evictable by memory shrinker
> + * @shmem: shmem GEM object
> + *
> + * Tell memory shrinker that this GEM can be evicted. Initially eviction is
> + * disabled for all GEMs. If GEM was purged, then -ENOMEM is returned.
> + *
> + * Returns:
> + * 0 on success or a negative error code on failure.
> + */
> +int drm_gem_shmem_set_evictable(struct drm_gem_shmem_object *shmem)
> +{
> +   dma_resv_lock(shmem->base.resv, NULL);
> +
> +   if (shmem->madv < 0)
> +   return -ENOMEM;
> +
> +   shmem->eviction_enabled = true;
> +
> +   dma_resv_unlock(shmem->base.resv);
> +
> +   return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_set_evictable);
> +
> +/**
> + * drm_gem_shmem_set_purgeable() - Make GEM purgeable by memory shrinker
> + * @shmem: shmem GEM object
> + *
> + * Tell memory shrinker that this GEM can be purged. Initially purging is
> + * disabled for all GEMs. If GEM was purged, then -ENOMEM is returned.
> + *
>

Re: [Intel-gfx] [PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

2022-06-20 Thread Rob Clark

On Mon, Jun 20, 2022 at 7:09 AM Dmitry Osipenko
 wrote:
>
> On 6/19/22 20:53, Rob Clark wrote:
> ...
> >> +static unsigned long
> >> +drm_gem_shmem_shrinker_count_objects(struct shrinker *shrinker,
> >> +struct shrink_control *sc)
> >> +{
> >> +   struct drm_gem_shmem_shrinker *gem_shrinker = 
> >> to_drm_shrinker(shrinker);
> >> +   struct drm_gem_shmem_object *shmem;
> >> +   unsigned long count = 0;
> >> +
> >> +   if (!mutex_trylock(_shrinker->lock))
> >> +   return 0;
> >> +
> >> +   list_for_each_entry(shmem, _shrinker->lru_evictable, 
> >> madv_list) {
> >> +   count += shmem->base.size;
> >> +
> >> +   if (count >= SHRINK_EMPTY)
> >> +   break;
> >> +   }
> >> +
> >> +   mutex_unlock(_shrinker->lock);
> >
> > As I mentioned on other thread, count_objects, being approximate but
> > lockless and fast is the important thing.  Otherwise when you start
> > hitting the shrinker on many threads, you end up serializing them all,
> > even if you have no pages to return to the system at that point.
>
> Daniel's point for dropping the lockless variant was that we're already
> in trouble if we're hitting shrinker too often and extra optimizations
> won't bring much benefits to us.

At least with zram swap (which I highly recommend using even if you
are not using a physical swap file/partition), swapin/out is actually
quite fast.  And if you are leaning on zram swap to fit 8GB of chrome
browser on a 4GB device, the shrinker gets hit quite a lot.  Lower
spec (4GB RAM) chromebooks can be under constant memory pressure and
can quite easily get into a situation where you are hitting the
shrinker on many threads simultaneously.  So it is pretty important
for all shrinkers in the system (not just drm driver) to be as
concurrent as possible.  As long as you avoid serializing reclaim on
all the threads, performance can still be quite good, but if you don't
performance will fall off a cliff.

jfwiw, we are seeing pretty good results (iirc 40-70% increase in open
tab counts) with the combination of eviction + multigen LRU[1] +
sizing zram swap to be 2x physical RAM

[1] https://lwn.net/Articles/856931/

> Alright, I'll add back the lockless variant (or will use yours
> drm_gem_lru) in the next revision. The code difference is very small
> after all.
>
> ...
> >> +   /* prevent racing with the dma-buf importing/exporting */
> >> +   if (!mutex_trylock(_shrinker->dev->object_name_lock)) {
> >> +   *lock_contention |= true;
> >> +   goto resv_unlock;
> >> +   }
> >
> > I'm not sure this is a good idea to serialize on object_name_lock.
> > Purgeable buffers should never be shared (imported or exported).  So
> > at best you are avoiding evicting and immediately swapping back in, in
> > a rare case, at the cost of serializing multiple threads trying to
> > reclaim pages in parallel.
>
> The object_name_lock shouldn't cause contention in practice. But objects
> are also pinned on attachment, hence maybe this lock is indeed
> unnecessary.. I'll re-check it.

I'm not worried about contention with export/import/etc, but
contention between multiple threads hitting the shrinker in parallel.
I guess since you are using trylock, it won't *block* the other
threads hitting shrinker, but they'll just end up looping in
do_shrink_slab() because they are hitting contention.

I'd have to do some experiments to see how it works out in practice,
but my gut feel is that it isn't a good idea

BR,
-R

> --
> Best regards,
> Dmitry

Re: [Intel-gfx] [PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

2022-06-19 Thread Rob Clark

On Thu, May 26, 2022 at 4:55 PM Dmitry Osipenko
 wrote:
>
> Introduce a common DRM SHMEM shrinker framework that allows to reduce
> code duplication among DRM drivers by replacing theirs custom shrinker
> implementations with the generic shrinker.
>
> In order to start using DRM SHMEM shrinker drivers should:
>
> 1. Implement new evict() shmem object callback.
> 2. Register shrinker using drm_gem_shmem_shrinker_register(drm_device).
> 3. Use drm_gem_shmem_set_purgeable(shmem) and alike API functions to
>activate shrinking of shmem GEMs.
>
> This patch is based on a ideas borrowed from Rob's Clark MSM shrinker,
> Thomas' Zimmermann variant of SHMEM shrinker and Intel's i915 shrinker.
>
> Signed-off-by: Daniel Almeida 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/gpu/drm/drm_gem_shmem_helper.c| 540 --
>  .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |   9 +-
>  drivers/gpu/drm/virtio/virtgpu_drv.h  |   3 +
>  include/drm/drm_device.h  |   4 +
>  include/drm/drm_gem_shmem_helper.h|  87 ++-
>  5 files changed, 594 insertions(+), 49 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
> b/drivers/gpu/drm/drm_gem_shmem_helper.c
> index 555fe212bd98..4cd0b5913492 100644
> --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> @@ -126,6 +126,42 @@ struct drm_gem_shmem_object *drm_gem_shmem_create(struct 
> drm_device *dev, size_t
>  }
>  EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
>
> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
> +{
> +   return (shmem->madv >= 0) && shmem->evict &&
> +   shmem->eviction_enabled && shmem->pages_use_count &&
> +   !shmem->pages_pin_count && !shmem->base.dma_buf &&
> +   !shmem->base.import_attach && shmem->sgt && !shmem->evicted;
> +}
> +
> +static void
> +drm_gem_shmem_update_pages_state(struct drm_gem_shmem_object *shmem)
> +{
> +   struct drm_gem_object *obj = >base;
> +   struct drm_gem_shmem_shrinker *gem_shrinker = 
> obj->dev->shmem_shrinker;
> +
> +   dma_resv_assert_held(shmem->base.resv);
> +
> +   if (!gem_shrinker || obj->import_attach)
> +   return;
> +
> +   mutex_lock(_shrinker->lock);
> +
> +   if (drm_gem_shmem_is_evictable(shmem) ||
> +   drm_gem_shmem_is_purgeable(shmem))
> +   list_move_tail(>madv_list, 
> _shrinker->lru_evictable);
> +   else if (shmem->madv < 0)
> +   list_del_init(>madv_list);
> +   else if (shmem->evicted)
> +   list_move_tail(>madv_list, _shrinker->lru_evicted);
> +   else if (!shmem->pages)
> +   list_del_init(>madv_list);
> +   else
> +   list_move_tail(>madv_list, _shrinker->lru_pinned);
> +
> +   mutex_unlock(_shrinker->lock);
> +}
> +
>  /**
>   * drm_gem_shmem_free - Free resources associated with a shmem GEM object
>   * @shmem: shmem GEM object to free
> @@ -142,6 +178,9 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object 
> *shmem)
> } else {
> dma_resv_lock(shmem->base.resv, NULL);
>
> +   /* take out shmem GEM object from the memory shrinker */
> +   drm_gem_shmem_madvise(shmem, -1);
> +
> WARN_ON(shmem->vmap_use_count);
>
> if (shmem->sgt) {
> @@ -150,7 +189,7 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object 
> *shmem)
> sg_free_table(shmem->sgt);
> kfree(shmem->sgt);
> }
> -   if (shmem->pages)
> +   if (shmem->pages_use_count)
> drm_gem_shmem_put_pages(shmem);
>
> WARN_ON(shmem->pages_use_count);
> @@ -163,18 +202,82 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object 
> *shmem)
>  }
>  EXPORT_SYMBOL_GPL(drm_gem_shmem_free);
>
> -static int drm_gem_shmem_get_pages(struct drm_gem_shmem_object *shmem)
> +/**
> + * drm_gem_shmem_set_evictable() - Make GEM evictable by memory shrinker
> + * @shmem: shmem GEM object
> + *
> + * Tell memory shrinker that this GEM can be evicted. Initially eviction is
> + * disabled for all GEMs. If GEM was purged, then -ENOMEM is returned.
> + *
> + * Returns:
> + * 0 on success or a negative error code on failure.
> + */
> +int drm_gem_shmem_set_evictable(struct drm_gem_shmem_object *shmem)
> +{
> +   dma_resv_lock(shmem->base.resv, NULL);
> +
> +   if (shmem->madv < 0)
> +   return -ENOMEM;
> +
> +   shmem->eviction_enabled = true;
> +
> +   dma_resv_unlock(shmem->base.resv);
> +
> +   return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_set_evictable);
> +
> +/**
> + * drm_gem_shmem_set_purgeable() - Make GEM purgeable by memory shrinker
> + * @shmem: shmem GEM object
> + *
> + * Tell memory shrinker that this GEM can be purged. Initially purging is
> + * disabled for all GEMs. If GEM was purged, then -ENOMEM is returned.
> + *
> + *

Re: [Intel-gfx] [PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

2022-06-05 Thread Rob Clark

On Sun, Jun 5, 2022 at 9:47 AM Daniel Vetter  wrote:
>
> On Fri, 27 May 2022 at 01:55, Dmitry Osipenko
>  wrote:
> >
> > Introduce a common DRM SHMEM shrinker framework that allows to reduce
> > code duplication among DRM drivers by replacing theirs custom shrinker
> > implementations with the generic shrinker.
> >
> > In order to start using DRM SHMEM shrinker drivers should:
> >
> > 1. Implement new evict() shmem object callback.
> > 2. Register shrinker using drm_gem_shmem_shrinker_register(drm_device).
> > 3. Use drm_gem_shmem_set_purgeable(shmem) and alike API functions to
> >activate shrinking of shmem GEMs.
> >
> > This patch is based on a ideas borrowed from Rob's Clark MSM shrinker,
> > Thomas' Zimmermann variant of SHMEM shrinker and Intel's i915 shrinker.
> >
> > Signed-off-by: Daniel Almeida 
> > Signed-off-by: Dmitry Osipenko 
>
> So I guess I get a price for being blind since forever, because this
> thing existed since at least 2013. I just stumbled over
> llist_lru.[hc], a purpose built list helper for shrinkers. I think we
> should try to adopt that so that our gpu shrinkers look more like
> shrinkers for everything else.

followup from a bit of irc discussion w/ danvet about list_lru:

* It seems to be missing a way to bail out of iteration before
  nr_to_scan is hit.. which is going to be inconvenient if you
  want to allow active bos on the LRU but bail scanning once
  you encounter the first one.

* Not sure if the numa node awareness is super useful for GEM
  bos

First issue is perhaps not too hard to fix.  But maybe a better
idea is a drm_gem_lru helper type thing which is more tailored
to GEM buffers?

BR,
-R

> Apologies for this, since I fear this might cause a bit of churn.
> Hopefully it's all contained to the list manipulation code in shmem
> helpers, I don't think this should leak any further.
> -Daniel
>
> > ---
> >  drivers/gpu/drm/drm_gem_shmem_helper.c| 540 --
> >  .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |   9 +-
> >  drivers/gpu/drm/virtio/virtgpu_drv.h  |   3 +
> >  include/drm/drm_device.h  |   4 +
> >  include/drm/drm_gem_shmem_helper.h|  87 ++-
> >  5 files changed, 594 insertions(+), 49 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
> > b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > index 555fe212bd98..4cd0b5913492 100644
> > --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> > +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > @@ -126,6 +126,42 @@ struct drm_gem_shmem_object 
> > *drm_gem_shmem_create(struct drm_device *dev, size_t
> >  }
> >  EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
> >
> > +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
> > +{
> > +   return (shmem->madv >= 0) && shmem->evict &&
> > +   shmem->eviction_enabled && shmem->pages_use_count &&
> > +   !shmem->pages_pin_count && !shmem->base.dma_buf &&
> > +   !shmem->base.import_attach && shmem->sgt && !shmem->evicted;
> > +}
> > +
> > +static void
> > +drm_gem_shmem_update_pages_state(struct drm_gem_shmem_object *shmem)
> > +{
> > +   struct drm_gem_object *obj = >base;
> > +   struct drm_gem_shmem_shrinker *gem_shrinker = 
> > obj->dev->shmem_shrinker;
> > +
> > +   dma_resv_assert_held(shmem->base.resv);
> > +
> > +   if (!gem_shrinker || obj->import_attach)
> > +   return;
> > +
> > +   mutex_lock(_shrinker->lock);
> > +
> > +   if (drm_gem_shmem_is_evictable(shmem) ||
> > +   drm_gem_shmem_is_purgeable(shmem))
> > +   list_move_tail(>madv_list, 
> > _shrinker->lru_evictable);
> > +   else if (shmem->madv < 0)
> > +   list_del_init(>madv_list);
> > +   else if (shmem->evicted)
> > +   list_move_tail(>madv_list, 
> > _shrinker->lru_evicted);
> > +   else if (!shmem->pages)
> > +   list_del_init(>madv_list);
> > +   else
> > +   list_move_tail(>madv_list, 
> > _shrinker->lru_pinned);
> > +
> > +   mutex_unlock(_shrinker->lock);
> > +}
> > +
> >  /**
> >   * drm_gem_shmem_free - Free resources associated with a shmem GEM object
> >   * @shmem: shmem GEM object to free
> > @@ -142,6 +178,9 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object 
> > *shmem)
> > } else {
> > dma_resv_lock(shmem->base.resv, NULL);
> >
> > +   /* take out shmem GEM object from the memory shrinker */
> > +   drm_gem_shmem_madvise(shmem, -1);
> > +
> > WARN_ON(shmem->vmap_use_count);
> >
> > if (shmem->sgt) {
> > @@ -150,7 +189,7 @@ void drm_gem_shmem_free(struct drm_gem_shmem_object 
> > *shmem)
> > sg_free_table(shmem->sgt);
> > kfree(shmem->sgt);
> > }
> > -   if (shmem->pages)
> > +   if (shmem->pages_use_count)
> > drm_gem_shmem_put_pages(shmem);
> >

1 2 3 >

1 - 100 of 238 matches

Mail list logo