syncobj: add new drm_syncobj_add_point interface v2

Daniel Vetter Wed, 12 Dec 2018 04:25:12 -0800

On Wed, Dec 12, 2018 at 12:40 PM Zhou, David(ChunMing)
<david1.z...@amd.com> wrote:
>
> + Daniel Rakos and Jason Ekstrand.
>
>  Below is the background, which is from Daniel R should  be able to explain 
> that's why:
> " ISVs, especially those coming from D3D12, are unsatisfied with the behavior 
> of the Vulkan semaphores as they are unhappy with the fact that for every 
> single dependency they need to use separate semaphores due to their binary 
> nature.
> Compared to that a synchronization primitive like D3D12 monitored fences 
> enable one of those to be used to track a sequence of operations by simply 
> associating timeline values to the completion of individual operations. This 
> allows them to track the lifetime and usage of resources and the ordered 
> completion of sequences.
> Besides that, they also want to use a single synchronization primitive to be 
> able to handle GPU-to-GPU and GPU-to-CPU dependencies, compared to using 
> semaphores for the former and fences for the latter.
> In addition, compared to legacy semaphores, timeline semaphores are proposed 
> to support wait-before-signal, i.e. allow enqueueing a semaphore wait 
> operation with a wait value that is larger than any of the already enqueued 
> signal values. This seems to be a hard requirement for ISVs. Without UMD-side 
> queue batching, and even UMD-side queue batching doesn’t help the situation 
> when such a semaphore is externally shared with another API. Thus in order to 
> properly support wait-before-signal the KMD implementation has to also be 
> able to support such dependencies.
> "


I was tangetially involved in that wg too, I understand the overall
use-case of vk timelines. I don't understand the exact corner case
here, because I wasn't deeply involved in the details.
-Daniel

> Btw, we already add test case to igt, and tested by many existing test, like 
> libdrm unit test, igt related test, vulkan cts, and steam games.
>
> -David
> > -----Original Message-----
> > From: Daniel Vetter <dan...@ffwll.ch>
> > Sent: Wednesday, December 12, 2018 7:15 PM
> > To: Koenig, Christian <christian.koe...@amd.com>
> > Cc: Zhou, David(ChunMing) <david1.z...@amd.com>; dri-devel <dri-
> > de...@lists.freedesktop.org>; amd-gfx list <amd-...@lists.freedesktop.org>;
> > intel-gfx <intel-...@lists.freedesktop.org>; Christian König
> > <ckoenig.leichtzumer...@gmail.com>
> > Subject: Re: [Intel-gfx] [PATCH 03/10] drm/syncobj: add new
> > drm_syncobj_add_point interface v2
> >
> > On Wed, Dec 12, 2018 at 12:08 PM Koenig, Christian
> > <christian.koe...@amd.com> wrote:
> > >
> > > Am 12.12.18 um 11:49 schrieb Daniel Vetter:
> > > > On Fri, Dec 07, 2018 at 11:54:15PM +0800, Chunming Zhou wrote:
> > > >> From: Christian König <ckoenig.leichtzumer...@gmail.com>
> > > >>
> > > >> Use the dma_fence_chain object to create a timeline of fence
> > > >> objects instead of just replacing the existing fence.
> > > >>
> > > >> v2: rebase and cleanup
> > > >>
> > > >> Signed-off-by: Christian König <christian.koe...@amd.com>
> > > > Somewhat jumping back into this. Not sure we discussed this already
> > > > or not. I'm a bit unclear on why we have to chain the fences in the
> > timeline:
> > > >
> > > > - The timeline stuff is modelled after the WDDM2 monitored fences.
> > Which
> > > >    really are just u64 counters in memory somewhere (I think could be
> > > >    system ram or vram). Because WDDM2 has the memory management
> > entirely
> > > >    separated from rendering synchronization it totally allows userspace 
> > > > to
> > > >    create loops and deadlocks and everything else nasty using this - the
> > > >    memory manager won't deadlock because these monitored fences
> > never leak
> > > >    into the buffer manager. And if CS deadlock, gpu reset takes care of 
> > > > the
> > > >    mess.
> > > >
> > > > - This has a few consequences, as in they seem to indeed work like a
> > > >    memory location: Userspace incrementing out-of-order (because they
> > run
> > > >    batches updating the same fence on different engines) is totally 
> > > > fine,
> > > >    as is doing anything else "stupid".
> > > >
> > > > - Now on linux we can't allow anything, because we need to make sure
> > that
> > > >    deadlocks don't leak into the memory manager. But as long as we block
> > > >    until the underlying dma_fence has materialized, nothing userspace 
> > > > can
> > > >    do will lead to such a deadlock. Even if userspace ends up submitting
> > > >    jobs without enough built-in synchronization, leading to out-of-order
> > > >    signalling of fences on that "timeline". And I don't think that would
> > > >    pose a problem for us.
> > > >
> > > > Essentially I think we can look at timeline syncobj as a dma_fence
> > > > container indexed through an integer, and there's no need to enforce
> > > > that the timline works like a real dma_fence timeline, with all it's
> > > > guarantees. It's just a pile of (possibly, if userspace is stupid)
> > > > unrelated dma_fences. You could implement the entire thing in
> > > > userspace after all, except for the "we want to share these timeline
> > > > objects between processes" problem.
> > > >
> > > > tldr; I think we can drop the dma_fence_chain complexity completely.
> > > > Or at least I'm not really understanding why it's needed.
> > > >
> > > > Of course that means drivers cannot treat a drm_syncobj timeline as
> > > > a dma_fence timeline. But given the future fences stuff and all
> > > > that, that's already out of the window anyway.
> > > >
> > > > What am I missing?
> > >
> > > Good question, since that was exactly my initial idea as well.
> > >
> > > Key point is that our Vulcan guys came back and said that this
> > > wouldn't be sufficient, but I honestly don't fully understand why.
> >
> > Hm, sounds like we really need those testscases (vk cts on top of mesa, igt)
> > so we can talk about the exact corner cases we care about and why.
> >
> > I guess one thing that might happen is that userspace leaves out a number
> > and never sets that fence, relying on the >= semantics of the monitored
> > fence to unblock that thread. E.g. when skipping a frame in one of the
> > auxiliary workloads. For that case we'd need to make sure we don't just wait
> > for the given fence to materialize, but also any fences later in the 
> > timeline.
> >
> > But we can't decide that without understanding the actual use-case that
> > needs to be supported at the other end of the stack, and how all the bits in
> > between should look like.
> >
> > I guess we're back to "uapi design without userspace doesn't make sense" ...
> >
> > > Anyway that's why David came up with using the fence array to wait for
> > > all previously added fences, which I then later on extended into this
> > > chain container.
> > >
> > > I have to admit that it is way more defensive implemented this way. E.g.
> > > there is much fewer things userspace can do wrong.
> > >
> > > The principal idea is that when they mess things up they are always
> > > going to wait more than necessary, but never less.
> >
> > That seems against the spirit of vulkan, which is very much about "you get 
> > all
> > the pieces". It also might dig us a hole in the future, if we ever get 
> > around to
> > moving towards a WDDM2 style memory management model. For future
> > proving I think it would make sense if we implement the minimal uapi we
> > need for vk timelines, not the strictest guarantees we can get away with
> > (without performance impact) with current drivers.
> > -Daniel
> >
> >
> > > Christian.
> > >
> > > > -Daniel
> > > >
> > > >> ---
> > > >>   drivers/gpu/drm/drm_syncobj.c | 37
> > +++++++++++++++++++++++++++++++++++
> > > >>   include/drm/drm_syncobj.h     |  5 +++++
> > > >>   2 files changed, 42 insertions(+)
> > > >>
> > > >> diff --git a/drivers/gpu/drm/drm_syncobj.c
> > > >> b/drivers/gpu/drm/drm_syncobj.c index e19525af0cce..51f798e2194f
> > > >> 100644
> > > >> --- a/drivers/gpu/drm/drm_syncobj.c
> > > >> +++ b/drivers/gpu/drm/drm_syncobj.c
> > > >> @@ -122,6 +122,43 @@ static void drm_syncobj_remove_wait(struct
> > drm_syncobj *syncobj,
> > > >>      spin_unlock(&syncobj->lock);
> > > >>   }
> > > >>
> > > >> +/**
> > > >> + * drm_syncobj_add_point - add new timeline point to the syncobj
> > > >> + * @syncobj: sync object to add timeline point do
> > > >> + * @chain: chain node to use to add the point
> > > >> + * @fence: fence to encapsulate in the chain node
> > > >> + * @point: sequence number to use for the point
> > > >> + *
> > > >> + * Add the chain node as new timeline point to the syncobj.
> > > >> + */
> > > >> +void drm_syncobj_add_point(struct drm_syncobj *syncobj,
> > > >> +                       struct dma_fence_chain *chain,
> > > >> +                       struct dma_fence *fence,
> > > >> +                       uint64_t point) {
> > > >> +    struct syncobj_wait_entry *cur, *tmp;
> > > >> +    struct dma_fence *prev;
> > > >> +
> > > >> +    dma_fence_get(fence);
> > > >> +
> > > >> +    spin_lock(&syncobj->lock);
> > > >> +
> > > >> +    prev = rcu_dereference_protected(syncobj->fence,
> > > >> +                                     lockdep_is_held(&syncobj->lock));
> > > >> +    dma_fence_chain_init(chain, prev, fence, point);
> > > >> +    rcu_assign_pointer(syncobj->fence, &chain->base);
> > > >> +
> > > >> +    list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) {
> > > >> +            list_del_init(&cur->node);
> > > >> +            syncobj_wait_syncobj_func(syncobj, cur);
> > > >> +    }
> > > >> +    spin_unlock(&syncobj->lock);
> > > >> +
> > > >> +    /* Walk the chain once to trigger garbage collection */
> > > >> +    dma_fence_chain_for_each(prev, fence); }
> > > >> +EXPORT_SYMBOL(drm_syncobj_add_point);
> > > >> +
> > > >>   /**
> > > >>    * drm_syncobj_replace_fence - replace fence in a sync object.
> > > >>    * @syncobj: Sync object to replace fence in diff --git
> > > >> a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index
> > > >> 7c6ed845c70d..8acb4ae4f311 100644
> > > >> --- a/include/drm/drm_syncobj.h
> > > >> +++ b/include/drm/drm_syncobj.h
> > > >> @@ -27,6 +27,7 @@
> > > >>   #define __DRM_SYNCOBJ_H__
> > > >>
> > > >>   #include "linux/dma-fence.h"
> > > >> +#include "linux/dma-fence-chain.h"
> > > >>
> > > >>   /**
> > > >>    * struct drm_syncobj - sync object.
> > > >> @@ -110,6 +111,10 @@ drm_syncobj_fence_get(struct drm_syncobj
> > > >> *syncobj)
> > > >>
> > > >>   struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,
> > > >>                                   u32 handle);
> > > >> +void drm_syncobj_add_point(struct drm_syncobj *syncobj,
> > > >> +                       struct dma_fence_chain *chain,
> > > >> +                       struct dma_fence *fence,
> > > >> +                       uint64_t point);
> > > >>   void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
> > > >>                             struct dma_fence *fence);
> > > >>   int drm_syncobj_find_fence(struct drm_file *file_private,
> > > >> --
> > > >> 2.17.1
> > > >>
> > > >> _______________________________________________
> > > >> Intel-gfx mailing list
> > > >> intel-...@lists.freedesktop.org
> > > >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Intel-gfx] [PATCH 03/10] drm/syncobj: add new drm_syncobj_add_point interface v2

Reply via email to