On Wed, Dec 12, 2018 at 12:40 PM Zhou, David(ChunMing) <david1.z...@amd.com> wrote: > > + Daniel Rakos and Jason Ekstrand. > > Below is the background, which is from Daniel R should be able to explain > that's why: > " ISVs, especially those coming from D3D12, are unsatisfied with the behavior > of the Vulkan semaphores as they are unhappy with the fact that for every > single dependency they need to use separate semaphores due to their binary > nature. > Compared to that a synchronization primitive like D3D12 monitored fences > enable one of those to be used to track a sequence of operations by simply > associating timeline values to the completion of individual operations. This > allows them to track the lifetime and usage of resources and the ordered > completion of sequences. > Besides that, they also want to use a single synchronization primitive to be > able to handle GPU-to-GPU and GPU-to-CPU dependencies, compared to using > semaphores for the former and fences for the latter. > In addition, compared to legacy semaphores, timeline semaphores are proposed > to support wait-before-signal, i.e. allow enqueueing a semaphore wait > operation with a wait value that is larger than any of the already enqueued > signal values. This seems to be a hard requirement for ISVs. Without UMD-side > queue batching, and even UMD-side queue batching doesn’t help the situation > when such a semaphore is externally shared with another API. Thus in order to > properly support wait-before-signal the KMD implementation has to also be > able to support such dependencies. > "
I was tangetially involved in that wg too, I understand the overall use-case of vk timelines. I don't understand the exact corner case here, because I wasn't deeply involved in the details. -Daniel > Btw, we already add test case to igt, and tested by many existing test, like > libdrm unit test, igt related test, vulkan cts, and steam games. > > -David > > -----Original Message----- > > From: Daniel Vetter <dan...@ffwll.ch> > > Sent: Wednesday, December 12, 2018 7:15 PM > > To: Koenig, Christian <christian.koe...@amd.com> > > Cc: Zhou, David(ChunMing) <david1.z...@amd.com>; dri-devel <dri- > > de...@lists.freedesktop.org>; amd-gfx list <amd-...@lists.freedesktop.org>; > > intel-gfx <intel-...@lists.freedesktop.org>; Christian König > > <ckoenig.leichtzumer...@gmail.com> > > Subject: Re: [Intel-gfx] [PATCH 03/10] drm/syncobj: add new > > drm_syncobj_add_point interface v2 > > > > On Wed, Dec 12, 2018 at 12:08 PM Koenig, Christian > > <christian.koe...@amd.com> wrote: > > > > > > Am 12.12.18 um 11:49 schrieb Daniel Vetter: > > > > On Fri, Dec 07, 2018 at 11:54:15PM +0800, Chunming Zhou wrote: > > > >> From: Christian König <ckoenig.leichtzumer...@gmail.com> > > > >> > > > >> Use the dma_fence_chain object to create a timeline of fence > > > >> objects instead of just replacing the existing fence. > > > >> > > > >> v2: rebase and cleanup > > > >> > > > >> Signed-off-by: Christian König <christian.koe...@amd.com> > > > > Somewhat jumping back into this. Not sure we discussed this already > > > > or not. I'm a bit unclear on why we have to chain the fences in the > > timeline: > > > > > > > > - The timeline stuff is modelled after the WDDM2 monitored fences. > > Which > > > > really are just u64 counters in memory somewhere (I think could be > > > > system ram or vram). Because WDDM2 has the memory management > > entirely > > > > separated from rendering synchronization it totally allows userspace > > > > to > > > > create loops and deadlocks and everything else nasty using this - the > > > > memory manager won't deadlock because these monitored fences > > never leak > > > > into the buffer manager. And if CS deadlock, gpu reset takes care of > > > > the > > > > mess. > > > > > > > > - This has a few consequences, as in they seem to indeed work like a > > > > memory location: Userspace incrementing out-of-order (because they > > run > > > > batches updating the same fence on different engines) is totally > > > > fine, > > > > as is doing anything else "stupid". > > > > > > > > - Now on linux we can't allow anything, because we need to make sure > > that > > > > deadlocks don't leak into the memory manager. But as long as we block > > > > until the underlying dma_fence has materialized, nothing userspace > > > > can > > > > do will lead to such a deadlock. Even if userspace ends up submitting > > > > jobs without enough built-in synchronization, leading to out-of-order > > > > signalling of fences on that "timeline". And I don't think that would > > > > pose a problem for us. > > > > > > > > Essentially I think we can look at timeline syncobj as a dma_fence > > > > container indexed through an integer, and there's no need to enforce > > > > that the timline works like a real dma_fence timeline, with all it's > > > > guarantees. It's just a pile of (possibly, if userspace is stupid) > > > > unrelated dma_fences. You could implement the entire thing in > > > > userspace after all, except for the "we want to share these timeline > > > > objects between processes" problem. > > > > > > > > tldr; I think we can drop the dma_fence_chain complexity completely. > > > > Or at least I'm not really understanding why it's needed. > > > > > > > > Of course that means drivers cannot treat a drm_syncobj timeline as > > > > a dma_fence timeline. But given the future fences stuff and all > > > > that, that's already out of the window anyway. > > > > > > > > What am I missing? > > > > > > Good question, since that was exactly my initial idea as well. > > > > > > Key point is that our Vulcan guys came back and said that this > > > wouldn't be sufficient, but I honestly don't fully understand why. > > > > Hm, sounds like we really need those testscases (vk cts on top of mesa, igt) > > so we can talk about the exact corner cases we care about and why. > > > > I guess one thing that might happen is that userspace leaves out a number > > and never sets that fence, relying on the >= semantics of the monitored > > fence to unblock that thread. E.g. when skipping a frame in one of the > > auxiliary workloads. For that case we'd need to make sure we don't just wait > > for the given fence to materialize, but also any fences later in the > > timeline. > > > > But we can't decide that without understanding the actual use-case that > > needs to be supported at the other end of the stack, and how all the bits in > > between should look like. > > > > I guess we're back to "uapi design without userspace doesn't make sense" ... > > > > > Anyway that's why David came up with using the fence array to wait for > > > all previously added fences, which I then later on extended into this > > > chain container. > > > > > > I have to admit that it is way more defensive implemented this way. E.g. > > > there is much fewer things userspace can do wrong. > > > > > > The principal idea is that when they mess things up they are always > > > going to wait more than necessary, but never less. > > > > That seems against the spirit of vulkan, which is very much about "you get > > all > > the pieces". It also might dig us a hole in the future, if we ever get > > around to > > moving towards a WDDM2 style memory management model. For future > > proving I think it would make sense if we implement the minimal uapi we > > need for vk timelines, not the strictest guarantees we can get away with > > (without performance impact) with current drivers. > > -Daniel > > > > > > > Christian. > > > > > > > -Daniel > > > > > > > >> --- > > > >> drivers/gpu/drm/drm_syncobj.c | 37 > > +++++++++++++++++++++++++++++++++++ > > > >> include/drm/drm_syncobj.h | 5 +++++ > > > >> 2 files changed, 42 insertions(+) > > > >> > > > >> diff --git a/drivers/gpu/drm/drm_syncobj.c > > > >> b/drivers/gpu/drm/drm_syncobj.c index e19525af0cce..51f798e2194f > > > >> 100644 > > > >> --- a/drivers/gpu/drm/drm_syncobj.c > > > >> +++ b/drivers/gpu/drm/drm_syncobj.c > > > >> @@ -122,6 +122,43 @@ static void drm_syncobj_remove_wait(struct > > drm_syncobj *syncobj, > > > >> spin_unlock(&syncobj->lock); > > > >> } > > > >> > > > >> +/** > > > >> + * drm_syncobj_add_point - add new timeline point to the syncobj > > > >> + * @syncobj: sync object to add timeline point do > > > >> + * @chain: chain node to use to add the point > > > >> + * @fence: fence to encapsulate in the chain node > > > >> + * @point: sequence number to use for the point > > > >> + * > > > >> + * Add the chain node as new timeline point to the syncobj. > > > >> + */ > > > >> +void drm_syncobj_add_point(struct drm_syncobj *syncobj, > > > >> + struct dma_fence_chain *chain, > > > >> + struct dma_fence *fence, > > > >> + uint64_t point) { > > > >> + struct syncobj_wait_entry *cur, *tmp; > > > >> + struct dma_fence *prev; > > > >> + > > > >> + dma_fence_get(fence); > > > >> + > > > >> + spin_lock(&syncobj->lock); > > > >> + > > > >> + prev = rcu_dereference_protected(syncobj->fence, > > > >> + lockdep_is_held(&syncobj->lock)); > > > >> + dma_fence_chain_init(chain, prev, fence, point); > > > >> + rcu_assign_pointer(syncobj->fence, &chain->base); > > > >> + > > > >> + list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { > > > >> + list_del_init(&cur->node); > > > >> + syncobj_wait_syncobj_func(syncobj, cur); > > > >> + } > > > >> + spin_unlock(&syncobj->lock); > > > >> + > > > >> + /* Walk the chain once to trigger garbage collection */ > > > >> + dma_fence_chain_for_each(prev, fence); } > > > >> +EXPORT_SYMBOL(drm_syncobj_add_point); > > > >> + > > > >> /** > > > >> * drm_syncobj_replace_fence - replace fence in a sync object. > > > >> * @syncobj: Sync object to replace fence in diff --git > > > >> a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index > > > >> 7c6ed845c70d..8acb4ae4f311 100644 > > > >> --- a/include/drm/drm_syncobj.h > > > >> +++ b/include/drm/drm_syncobj.h > > > >> @@ -27,6 +27,7 @@ > > > >> #define __DRM_SYNCOBJ_H__ > > > >> > > > >> #include "linux/dma-fence.h" > > > >> +#include "linux/dma-fence-chain.h" > > > >> > > > >> /** > > > >> * struct drm_syncobj - sync object. > > > >> @@ -110,6 +111,10 @@ drm_syncobj_fence_get(struct drm_syncobj > > > >> *syncobj) > > > >> > > > >> struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, > > > >> u32 handle); > > > >> +void drm_syncobj_add_point(struct drm_syncobj *syncobj, > > > >> + struct dma_fence_chain *chain, > > > >> + struct dma_fence *fence, > > > >> + uint64_t point); > > > >> void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, > > > >> struct dma_fence *fence); > > > >> int drm_syncobj_find_fence(struct drm_file *file_private, > > > >> -- > > > >> 2.17.1 > > > >> > > > >> _______________________________________________ > > > >> Intel-gfx mailing list > > > >> intel-...@lists.freedesktop.org > > > >> https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel