subject:"Re\: RFC\: Add write flag to reservation object fences"

Re: RFC: Add write flag to reservation object fences

2018-08-10 Thread Christian König


Am 10.08.2018 um 11:21 schrieb Daniel Vetter:

[SNIP]
Then don't track _any_ of the amdgpu internal fences in the reservation object:
- 1 reservation object that you hand to ttm, for use internally within amdgpu
- 1 reservation object that you attach to the dma-buf (or get from the
imported dma-buf), where you play all the tricks to fake fences.


Well that is an interesting idea. Going to try that one out.

Regards,
Christian.


-Daniel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: RFC: Add write flag to reservation object fences

2018-08-10 Thread Daniel Vetter

On Fri, Aug 10, 2018 at 11:14 AM, Christian König
 wrote:
> Am 10.08.2018 um 10:29 schrieb Daniel Vetter:
>>
>> [SNIP]
>> I'm only interested in the case of shared buffers. And for those you
>> _do_ pessimistically assume that all access must be implicitly synced.
>> Iirc amdgpu doesn't support EGL_ANDROID_native_fence_sync, so this
>> makes sense that you don't bother with it.
>
>
> See flag AMDGPU_GEM_CREATE_EXPLICIT_SYNC.

That's for radv. Won't be enough for EGL_ANDROID_native_fence_sync,
because you cannot know at buffer allocation time how the fencing will
be done in all cases.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: RFC: Add write flag to reservation object fences

2018-08-10 Thread Daniel Vetter

On Fri, Aug 10, 2018 at 11:14 AM, Christian König
 wrote:
> Am 10.08.2018 um 10:29 schrieb Daniel Vetter:
>>
>> [SNIP]
>> I'm only interested in the case of shared buffers. And for those you
>> _do_ pessimistically assume that all access must be implicitly synced.
>> Iirc amdgpu doesn't support EGL_ANDROID_native_fence_sync, so this
>> makes sense that you don't bother with it.
>
>
> See flag AMDGPU_GEM_CREATE_EXPLICIT_SYNC.
>
>
>>
 - as a consequence, amdgpu needs to pessimistically assume that all
 writes to shared buffer need to obey implicit fencing rules.
 - for shared buffers (across process or drivers) implicit fencing does
 _not_ allow concurrent writers. That limitation is why people want to
 do explicit fencing, and it's the reason why there's only 1 slot for
 an exclusive. Note I really mean concurrent here, a queue of in-flight
 writes by different batches is perfectly fine. But it's a fully
 ordered queue of writes.
 - but as a consequence of amdgpu's lack of implicit fencing and hence
 need to pessimistically assume there's multiple write fences amdgpu
 needs to put multiple fences behind the single exclusive slot. This is
 a limitation imposed by by the amdgpu stack, not something inherit to
 how implicit fencing works.
 - Chris Wilson's patch implements all this (and afaics with a bit more
 coffee, correctly).

 If you want to be less pessimistic in amdgpu for shared buffers, you
 need to start tracking which shared buffer access need implicit and
 which explicit sync. What you can't do is suddenly create more than 1
 exclusive fence, that's not how implicit fencing works. Another thing
 you cannot do is force everyone else (in non-amdgpu or core code) to
 sync against _all_ writes, because that forces implicit syncing. Which
 people very much don't want.
>>>
>>>
>>> I also do see the problem that most other hardware doesn't need that
>>> functionality, because it is driven by a single engine. That's why I
>>> tried
>>> to keep the overhead as low as possible.
>>>
>>> But at least for amdgpu (and I strongly suspect for nouveau as well) it
>>> is
>>> absolutely vital in a number of cases to allow concurrent accesses from
>>> the
>>> same client even when the BO is then later used with implicit
>>> synchronization.
>>>
>>> This is also the reason why the current workaround is so problematic for
>>> us.
>>> Cause as soon as the BO is shared with another (non-amdgpu) device all
>>> command submissions to it will be serialized even when they come from the
>>> same client.
>>>
>>> Would it be an option extend the concept of the "owner" of the BO amdpgu
>>> uses to other drivers as well?
>>>
>>> When you already have explicit synchronization insider your client, but
>>> not
>>> between clients (e.g. still uses DRI2 or DRI3), this could also be rather
>>> beneficial for others as well.
>>
>> Again: How you synchronize your driver internal rendering is totally
>> up to you. If you see an exclusive fence by amdgpu, and submit new
>> rendering by amdgpu, you can totally ignore the exclusive fence. The
>> only api contracts for implicit fencing are between drivers for shared
>> buffers. If you submit rendering to a shared buffer in parallel, all
>> without attaching an exclusive fence that's perfectly fine, but
>> somewhen later on, depending upon protocol (glFlush or glxSwapBuffers
>> or whatever) you have to collect all those concurrent write hazards
>> and bake them into 1 single exclusive fence for implicit fencing.
>>
>> Atm (and Chris seems to concur) the amdgpu uapi doesn't allow you to
>> do that, so for anything shared you have to be super pessimistic.
>> Adding a HAND_OFF_FOR_IMPLICIT_FENCING flag/ioctl would probably fix
>> that. Only when that flag is set would you take all shared write
>> hazards and bake them into one exclusive fence for hand-off to the
>> next driver. You'd also need the same when receiving an implicitly
>> fenced buffer, to make sure that your concurrent writes do synchronize
>> with reading (aka shared fences) done by other drivers. With a bunch
>> of trickery and hacks it might be possible to infer this from current
>> ioctls even, but you need to be really careful.
>
>
> A new uapi is out of question because we need to be backward compatible.

Since when is new uapi out of the question for a performance improvement?

>> And you're right that amdgpu seems to be the only (or one of the only)
>> drivers which do truly concurrent rendering to the same buffer (not
>> just concurrent rendering to multiple buffers all suballocated from
>> the same bo). But we can't fix this in the kernel with the tricks you
>> propose, because without such an uapi extension (or inference) we
>> can't tell the implicit fencing from the explicit fencing case.
>
>
> Sure we can. As I said for amdgpu that is absolutely no problem at all.
>
> In your terminology all rendering from the same client to a BO

Re: RFC: Add write flag to reservation object fences

2018-08-10 Thread Christian König


Am 10.08.2018 um 10:29 schrieb Daniel Vetter:

[SNIP]
I'm only interested in the case of shared buffers. And for those you
_do_ pessimistically assume that all access must be implicitly synced.
Iirc amdgpu doesn't support EGL_ANDROID_native_fence_sync, so this
makes sense that you don't bother with it.


See flag AMDGPU_GEM_CREATE_EXPLICIT_SYNC.




- as a consequence, amdgpu needs to pessimistically assume that all
writes to shared buffer need to obey implicit fencing rules.
- for shared buffers (across process or drivers) implicit fencing does
_not_ allow concurrent writers. That limitation is why people want to
do explicit fencing, and it's the reason why there's only 1 slot for
an exclusive. Note I really mean concurrent here, a queue of in-flight
writes by different batches is perfectly fine. But it's a fully
ordered queue of writes.
- but as a consequence of amdgpu's lack of implicit fencing and hence
need to pessimistically assume there's multiple write fences amdgpu
needs to put multiple fences behind the single exclusive slot. This is
a limitation imposed by by the amdgpu stack, not something inherit to
how implicit fencing works.
- Chris Wilson's patch implements all this (and afaics with a bit more
coffee, correctly).

If you want to be less pessimistic in amdgpu for shared buffers, you
need to start tracking which shared buffer access need implicit and
which explicit sync. What you can't do is suddenly create more than 1
exclusive fence, that's not how implicit fencing works. Another thing
you cannot do is force everyone else (in non-amdgpu or core code) to
sync against _all_ writes, because that forces implicit syncing. Which
people very much don't want.


I also do see the problem that most other hardware doesn't need that
functionality, because it is driven by a single engine. That's why I tried
to keep the overhead as low as possible.

But at least for amdgpu (and I strongly suspect for nouveau as well) it is
absolutely vital in a number of cases to allow concurrent accesses from the
same client even when the BO is then later used with implicit
synchronization.

This is also the reason why the current workaround is so problematic for us.
Cause as soon as the BO is shared with another (non-amdgpu) device all
command submissions to it will be serialized even when they come from the
same client.

Would it be an option extend the concept of the "owner" of the BO amdpgu
uses to other drivers as well?

When you already have explicit synchronization insider your client, but not
between clients (e.g. still uses DRI2 or DRI3), this could also be rather
beneficial for others as well.

Again: How you synchronize your driver internal rendering is totally
up to you. If you see an exclusive fence by amdgpu, and submit new
rendering by amdgpu, you can totally ignore the exclusive fence. The
only api contracts for implicit fencing are between drivers for shared
buffers. If you submit rendering to a shared buffer in parallel, all
without attaching an exclusive fence that's perfectly fine, but
somewhen later on, depending upon protocol (glFlush or glxSwapBuffers
or whatever) you have to collect all those concurrent write hazards
and bake them into 1 single exclusive fence for implicit fencing.

Atm (and Chris seems to concur) the amdgpu uapi doesn't allow you to
do that, so for anything shared you have to be super pessimistic.
Adding a HAND_OFF_FOR_IMPLICIT_FENCING flag/ioctl would probably fix
that. Only when that flag is set would you take all shared write
hazards and bake them into one exclusive fence for hand-off to the
next driver. You'd also need the same when receiving an implicitly
fenced buffer, to make sure that your concurrent writes do synchronize
with reading (aka shared fences) done by other drivers. With a bunch
of trickery and hacks it might be possible to infer this from current
ioctls even, but you need to be really careful.


A new uapi is out of question because we need to be backward compatible.


And you're right that amdgpu seems to be the only (or one of the only)
drivers which do truly concurrent rendering to the same buffer (not
just concurrent rendering to multiple buffers all suballocated from
the same bo). But we can't fix this in the kernel with the tricks you
propose, because without such an uapi extension (or inference) we
can't tell the implicit fencing from the explicit fencing case.


Sure we can. As I said for amdgpu that is absolutely no problem at all.

In your terminology all rendering from the same client to a BO is 
explicitly fenced, while all rendering from different clients are 
implicit fenced.



And for shared buffers with explicit fencing we _must_ _not_ sync against
all writes. owner won't help here, because it's still not tracking
whether something is explicit or implicit synced.


Implicit syncing can be disable by giving the 
AMDGPU_GEM_CREATE_EXPLICIT_SYNC flag while creating the BO.



We've cheated a bit with most other drivers in this area, also becaus

Re: RFC: Add write flag to reservation object fences

2018-08-10 Thread Daniel Vetter

On Thu, Aug 9, 2018 at 4:54 PM, Christian König
 wrote:
> Am 09.08.2018 um 16:22 schrieb Daniel Vetter:
>>
>> On Thu, Aug 9, 2018 at 3:58 PM, Christian König
>>  wrote:
>>>
>>> Am 09.08.2018 um 15:38 schrieb Daniel Vetter:

 On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
 [SNIP]
>>>
>>> See to me the explicit fence in the reservation object is not even
>>> remotely
>>> related to implicit or explicit synchronization.
>>
>> Hm, I guess that's the confusion then. The only reason we have the
>> exclusive fence is to implement cross-driver implicit syncing. What
>> else you do internally in your driver doesn't matter, as long as you
>> keep up that contract.
>>
>> And it's intentionally not called write_fence or anything like that,
>> because that's not what it tracks.
>>
>> Of course any buffer moves the kernel does also must be tracked in the
>> exclusive fence, because userspace cannot know about these. So you
>> might have an exclusive fence set and also an explicit fence passed in
>> through the atomic ioctl. Aside: Right now all drivers only observe
>> one or the other, not both, so will break as soon as we start moving
>> shared buffers around. At least on Android or anything else using
>> explicit fencing.
>
>
> Actually both radeon and nouveau use the approach that shared fences need to
> wait on as well when they don't come from the current driver.
>
>>
>> So here's my summary, as I understanding things right now:
>> - for non-shared buffers at least, amdgpu uses explicit fencing, and
>> hence all fences caused by userspace end up as shared fences, whether
>> that's writes or reads. This means you end up with possibly multiple
>> write fences, but never any exclusive fences.
>> - for non-shared buffers the only exclusive fences amdgpu sets are for
>> buffer moves done by the kernel.
>> - amgpu (kernel + userspace combo here) does not seem to have a
>> concept/tracking for when a buffer is used with implicit or explicit
>> fencing. It does however track all writes.
>
>
> No, that is incorrect. It tracks all accesses to a buffer object in the form
> of shared fences, we don't care if it is a write or not.
>
> What we track as well is which client uses a BO last and as long as the same
> client uses the BO we don't add any implicit synchronization.
>
> Only when a BO is used by another client we have implicit synchronization
> for all command submissions. This behavior can be disable with a flag during
> BO creation.

I'm only interested in the case of shared buffers. And for those you
_do_ pessimistically assume that all access must be implicitly synced.
Iirc amdgpu doesn't support EGL_ANDROID_native_fence_sync, so this
makes sense that you don't bother with it.

>> - as a consequence, amdgpu needs to pessimistically assume that all
>> writes to shared buffer need to obey implicit fencing rules.
>> - for shared buffers (across process or drivers) implicit fencing does
>> _not_ allow concurrent writers. That limitation is why people want to
>> do explicit fencing, and it's the reason why there's only 1 slot for
>> an exclusive. Note I really mean concurrent here, a queue of in-flight
>> writes by different batches is perfectly fine. But it's a fully
>> ordered queue of writes.
>> - but as a consequence of amdgpu's lack of implicit fencing and hence
>> need to pessimistically assume there's multiple write fences amdgpu
>> needs to put multiple fences behind the single exclusive slot. This is
>> a limitation imposed by by the amdgpu stack, not something inherit to
>> how implicit fencing works.
>> - Chris Wilson's patch implements all this (and afaics with a bit more
>> coffee, correctly).
>>
>> If you want to be less pessimistic in amdgpu for shared buffers, you
>> need to start tracking which shared buffer access need implicit and
>> which explicit sync. What you can't do is suddenly create more than 1
>> exclusive fence, that's not how implicit fencing works. Another thing
>> you cannot do is force everyone else (in non-amdgpu or core code) to
>> sync against _all_ writes, because that forces implicit syncing. Which
>> people very much don't want.
>
>
> I also do see the problem that most other hardware doesn't need that
> functionality, because it is driven by a single engine. That's why I tried
> to keep the overhead as low as possible.
>
> But at least for amdgpu (and I strongly suspect for nouveau as well) it is
> absolutely vital in a number of cases to allow concurrent accesses from the
> same client even when the BO is then later used with implicit
> synchronization.
>
> This is also the reason why the current workaround is so problematic for us.
> Cause as soon as the BO is shared with another (non-amdgpu) device all
> command submissions to it will be serialized even when they come from the
> same client.
>
> Would it be an option extend the concept of the "owner" of the BO amdpgu
> uses to other drivers as well?
>
> When you already have explicit synchronization

Re: RFC: Add write flag to reservation object fences

2018-08-09 Thread Christian König


Am 09.08.2018 um 16:22 schrieb Daniel Vetter:

On Thu, Aug 9, 2018 at 3:58 PM, Christian König
 wrote:

Am 09.08.2018 um 15:38 schrieb Daniel Vetter:

On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
[SNIP]

See to me the explicit fence in the reservation object is not even remotely
related to implicit or explicit synchronization.

Hm, I guess that's the confusion then. The only reason we have the
exclusive fence is to implement cross-driver implicit syncing. What
else you do internally in your driver doesn't matter, as long as you
keep up that contract.

And it's intentionally not called write_fence or anything like that,
because that's not what it tracks.

Of course any buffer moves the kernel does also must be tracked in the
exclusive fence, because userspace cannot know about these. So you
might have an exclusive fence set and also an explicit fence passed in
through the atomic ioctl. Aside: Right now all drivers only observe
one or the other, not both, so will break as soon as we start moving
shared buffers around. At least on Android or anything else using
explicit fencing.


Actually both radeon and nouveau use the approach that shared fences 
need to wait on as well when they don't come from the current driver.




So here's my summary, as I understanding things right now:
- for non-shared buffers at least, amdgpu uses explicit fencing, and
hence all fences caused by userspace end up as shared fences, whether
that's writes or reads. This means you end up with possibly multiple
write fences, but never any exclusive fences.
- for non-shared buffers the only exclusive fences amdgpu sets are for
buffer moves done by the kernel.
- amgpu (kernel + userspace combo here) does not seem to have a
concept/tracking for when a buffer is used with implicit or explicit
fencing. It does however track all writes.


No, that is incorrect. It tracks all accesses to a buffer object in the 
form of shared fences, we don't care if it is a write or not.


What we track as well is which client uses a BO last and as long as the 
same client uses the BO we don't add any implicit synchronization.


Only when a BO is used by another client we have implicit 
synchronization for all command submissions. This behavior can be 
disable with a flag during BO creation.



- as a consequence, amdgpu needs to pessimistically assume that all
writes to shared buffer need to obey implicit fencing rules.
- for shared buffers (across process or drivers) implicit fencing does
_not_ allow concurrent writers. That limitation is why people want to
do explicit fencing, and it's the reason why there's only 1 slot for
an exclusive. Note I really mean concurrent here, a queue of in-flight
writes by different batches is perfectly fine. But it's a fully
ordered queue of writes.
- but as a consequence of amdgpu's lack of implicit fencing and hence
need to pessimistically assume there's multiple write fences amdgpu
needs to put multiple fences behind the single exclusive slot. This is
a limitation imposed by by the amdgpu stack, not something inherit to
how implicit fencing works.
- Chris Wilson's patch implements all this (and afaics with a bit more
coffee, correctly).

If you want to be less pessimistic in amdgpu for shared buffers, you
need to start tracking which shared buffer access need implicit and
which explicit sync. What you can't do is suddenly create more than 1
exclusive fence, that's not how implicit fencing works. Another thing
you cannot do is force everyone else (in non-amdgpu or core code) to
sync against _all_ writes, because that forces implicit syncing. Which
people very much don't want.


I also do see the problem that most other hardware doesn't need that 
functionality, because it is driven by a single engine. That's why I 
tried to keep the overhead as low as possible.


But at least for amdgpu (and I strongly suspect for nouveau as well) it 
is absolutely vital in a number of cases to allow concurrent accesses 
from the same client even when the BO is then later used with implicit 
synchronization.


This is also the reason why the current workaround is so problematic for 
us. Cause as soon as the BO is shared with another (non-amdgpu) device 
all command submissions to it will be serialized even when they come 
from the same client.


Would it be an option extend the concept of the "owner" of the BO amdpgu 
uses to other drivers as well?


When you already have explicit synchronization insider your client, but 
not between clients (e.g. still uses DRI2 or DRI3), this could also be 
rather beneficial for others as well.


Regards,
Christian.


-Daniel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: RFC: Add write flag to reservation object fences

2018-08-09 Thread Daniel Vetter

On Thu, Aug 9, 2018 at 3:58 PM, Christian König
 wrote:
> Am 09.08.2018 um 15:38 schrieb Daniel Vetter:
>>
>> On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
>>>
>>> Hi everyone,
>>>
>>> This set of patches tries to improve read after write hazard handling
>>> for reservation objects.
>>>
>>> It allows us to specify for each shared fence if it represents a write
>>> operation.
>>>
>>> Based on this the i915 driver is modified to always wait for all writes
>>> before pageflip and the previously used workaround is removed from
>>> amdgpu.
>>
>> Hm, I thought after the entire discussions we agreed again that it's _not_
>> the write hazard we want to track, but whether there's an exclusive fence
>> that must be observed for implicit buffer sync. That's why it's called the
>> exclusive fence, not the write fence!
>>
>> If you want multiple of those, I guess we could add those, but that
>> doesn't really make sense - how exactly did you end up with multiple
>> exclusive fences in the first place?
>
>
> Maybe you misunderstood me, we don't have multiple exclusive fences.
>
> What we have are multiple writers which write to the BO. In other words
> multiple engines which compose the content of the BO at the same time.
>
> For page flipping we need to wait for all of them to completed.
>
>> i915 (and fwiw, any other driver) does _not_ want to observe all write
>> fences attached to a dma-buf. We want to _only_ observe the single
>> exclusive fence used for implicit buffer sync, which might or might not
>> exist. Otherwise the entire point of having explicit sync and explicit
>> fences in the atomic ioctl is out of the window and the use case of 2
>> draw/flip loops using a single buffer is defeated.
>
>
> What do you mean with that?
>
> Even for the atomic IOCTL with implicit fencing I strongly suspect that we
> can wait for multiple fences before doing the flip. Otherwise it would not
> really be useful to us.
>
>> Again: How exactly you construct that exclusive fences, and how exactly
>> the kernel and userspace cooperate to figure out when to set the exclusive
>> fences, is 100% up to amdgpu. If you do explicit sync by default, and only
>> switch to implicit sync (and setting the exclusive fence) as needed,
>> that's perfectly fine.  No need at all to leak that into core code and
>> confuse everyone that there's multiple exclusive fences they need to
>> somehow observe.
>
>
> I simply never have a single exclusive fence provided by userspace.
>
> I always have multiple command submissions accessing the buffer at the same
> time.
>
> See to me the explicit fence in the reservation object is not even remotely
> related to implicit or explicit synchronization.

Hm, I guess that's the confusion then. The only reason we have the
exclusive fence is to implement cross-driver implicit syncing. What
else you do internally in your driver doesn't matter, as long as you
keep up that contract.

And it's intentionally not called write_fence or anything like that,
because that's not what it tracks.

Of course any buffer moves the kernel does also must be tracked in the
exclusive fence, because userspace cannot know about these. So you
might have an exclusive fence set and also an explicit fence passed in
through the atomic ioctl. Aside: Right now all drivers only observe
one or the other, not both, so will break as soon as we start moving
shared buffers around. At least on Android or anything else using
explicit fencing.

So here's my summary, as I understanding things right now:
- for non-shared buffers at least, amdgpu uses explicit fencing, and
hence all fences caused by userspace end up as shared fences, whether
that's writes or reads. This means you end up with possibly multiple
write fences, but never any exclusive fences.
- for non-shared buffers the only exclusive fences amdgpu sets are for
buffer moves done by the kernel.
- amgpu (kernel + userspace combo here) does not seem to have a
concept/tracking for when a buffer is used with implicit or explicit
fencing. It does however track all writes.
- as a consequence, amdgpu needs to pessimistically assume that all
writes to shared buffer need to obey implicit fencing rules.
- for shared buffers (across process or drivers) implicit fencing does
_not_ allow concurrent writers. That limitation is why people want to
do explicit fencing, and it's the reason why there's only 1 slot for
an exclusive. Note I really mean concurrent here, a queue of in-flight
writes by different batches is perfectly fine. But it's a fully
ordered queue of writes.
- but as a consequence of amdgpu's lack of implicit fencing and hence
need to pessimistically assume there's multiple write fences amdgpu
needs to put multiple fences behind the single exclusive slot. This is
a limitation imposed by by the amdgpu stack, not something inherit to
how implicit fencing works.
- Chris Wilson's patch implements all this (and afaics with a bit more
coffee, correctly).

If you want to be less pes

Re: RFC: Add write flag to reservation object fences

2018-08-09 Thread Christian König


Am 09.08.2018 um 15:38 schrieb Daniel Vetter:

On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:

Hi everyone,

This set of patches tries to improve read after write hazard handling
for reservation objects.

It allows us to specify for each shared fence if it represents a write
operation.

Based on this the i915 driver is modified to always wait for all writes
before pageflip and the previously used workaround is removed from
amdgpu.

Hm, I thought after the entire discussions we agreed again that it's _not_
the write hazard we want to track, but whether there's an exclusive fence
that must be observed for implicit buffer sync. That's why it's called the
exclusive fence, not the write fence!

If you want multiple of those, I guess we could add those, but that
doesn't really make sense - how exactly did you end up with multiple
exclusive fences in the first place?


Maybe you misunderstood me, we don't have multiple exclusive fences.

What we have are multiple writers which write to the BO. In other words 
multiple engines which compose the content of the BO at the same time.


For page flipping we need to wait for all of them to completed.


i915 (and fwiw, any other driver) does _not_ want to observe all write
fences attached to a dma-buf. We want to _only_ observe the single
exclusive fence used for implicit buffer sync, which might or might not
exist. Otherwise the entire point of having explicit sync and explicit
fences in the atomic ioctl is out of the window and the use case of 2
draw/flip loops using a single buffer is defeated.


What do you mean with that?

Even for the atomic IOCTL with implicit fencing I strongly suspect that 
we can wait for multiple fences before doing the flip. Otherwise it 
would not really be useful to us.



Again: How exactly you construct that exclusive fences, and how exactly
the kernel and userspace cooperate to figure out when to set the exclusive
fences, is 100% up to amdgpu. If you do explicit sync by default, and only
switch to implicit sync (and setting the exclusive fence) as needed,
that's perfectly fine.  No need at all to leak that into core code and
confuse everyone that there's multiple exclusive fences they need to
somehow observe.


I simply never have a single exclusive fence provided by userspace.

I always have multiple command submissions accessing the buffer at the 
same time.


See to me the explicit fence in the reservation object is not even 
remotely related to implicit or explicit synchronization.


Regards,
Christian.



Cheers, Daniel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: RFC: Add write flag to reservation object fences

2018-08-09 Thread Daniel Vetter

On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
> Hi everyone,
> 
> This set of patches tries to improve read after write hazard handling
> for reservation objects.
> 
> It allows us to specify for each shared fence if it represents a write
> operation.
> 
> Based on this the i915 driver is modified to always wait for all writes
> before pageflip and the previously used workaround is removed from
> amdgpu.

Hm, I thought after the entire discussions we agreed again that it's _not_
the write hazard we want to track, but whether there's an exclusive fence
that must be observed for implicit buffer sync. That's why it's called the
exclusive fence, not the write fence!

If you want multiple of those, I guess we could add those, but that
doesn't really make sense - how exactly did you end up with multiple
exclusive fences in the first place?

i915 (and fwiw, any other driver) does _not_ want to observe all write
fences attached to a dma-buf. We want to _only_ observe the single
exclusive fence used for implicit buffer sync, which might or might not
exist. Otherwise the entire point of having explicit sync and explicit
fences in the atomic ioctl is out of the window and the use case of 2
draw/flip loops using a single buffer is defeated.

Again: How exactly you construct that exclusive fences, and how exactly
the kernel and userspace cooperate to figure out when to set the exclusive
fences, is 100% up to amdgpu. If you do explicit sync by default, and only
switch to implicit sync (and setting the exclusive fence) as needed,
that's perfectly fine.  No need at all to leak that into core code and
confuse everyone that there's multiple exclusive fences they need to
somehow observe.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: RFC: Add write flag to reservation object fences

Re: RFC: Add write flag to reservation object fences

Re: RFC: Add write flag to reservation object fences

Re: RFC: Add write flag to reservation object fences

Re: RFC: Add write flag to reservation object fences

Re: RFC: Add write flag to reservation object fences

Re: RFC: Add write flag to reservation object fences

Re: RFC: Add write flag to reservation object fences

Re: RFC: Add write flag to reservation object fences

9 matches

Site Navigation

Mail list logo

Footer information