Re: Buffer object access mode is per-operation, not per-buffer

Keith Packard Wed, 26 Dec 2007 14:19:18 -0800

On Wed, 2007-12-26 at 14:46 +0100, Thomas Hellström wrote:
> Keith Packard wrote:
> > On Tue, 2007-12-25 at 14:24 +0100, Thomas Hellström wrote:
> >   
> >   
> >> The fence type bits are solely intended to indicate different completion 
> >> stages, so if the union of access modes defined as (RWX) is the
> >> same for all fence objects of a particular driver, you're right. Fence 
> >> flushing of a particular fence type is intented to make sure that that
> >> fence type will eventually signal. This functionality (which may seem 
> >> awkward an unnecessary at first) can be used to turn on / off irqs in 
> >> the presence / absence of waiters and can be used to perform things like 
> >> RW flushing on Intel chipsets.
> >>     
> >
> > RW flushing is independent of fencing for most use cases -- when
> > switching between reading and writing operations in the ring, there's no
> > need to fence the operation, only to place a flush in the command
> > stream. The only time when flushing is related to fencing is when you
> > need to wait for a fence to expire with a flush operation in front of
> > it.
> >
> > Mixing flushing and fencing in the same mechanism has made that very
> > difficult to understand, and I believe it has removed the common
> > optimization where flushing needn't wait on a fence.
> >
> >   
> It seems that the fact that we don't agree on this is that we have 
> different views of what a fence is and that we're approaching the same 
> problem from different directions.
> 
> Let's start with the fence concept in the buffer manager code. A fence 
> object represents a sequence of GPU commands, and carries information 
> about the completion- and error status of this sequence of commands. 
> Each fence_type flag is a step in this completion status, and a fence is 
> not completely signaled until all fence_type flags have signaled.


Right, I think you're overloading the 'fence' concept here -- A fence
should indicate only the completion of a sequence of GPU commands, not
any information about external operations, like synchronous flush
operations. When a 'fence' passes, I want to know that a sequence of GPU
commands has completed.

In particular, I think that adding what amounts to a state machine to
the fencing logic is a bad idea -- if the driver needs a state machine
to complete a particular fence, the driver should implement that state
machine internally. If the driver has no IRQs to drive that state
machine, it should, internally, use other kernel mechanisms to perform
polling. Conflating the simple fence mechanism with driver-specific
state machine operation adds complexity to every driver, instead of only
to drivers which need these additional steps.

I would see each fence having a *single* type, which lets the driver
know how to signal it correctly. The driver could attach additional
information about the fence outside of the core code, allowing it to
sequence the fence through the necessary stages before it becomes
signalled.

Then, fence users would block waiting for the fence to become signalled,
the sequence of operations leading to that state transition would be
hidden from the fence users and the driver would be able to perform
whatever code was necessary behind the application's back.

> The meaning of the fence_type flags are driver-dependent. With the 
> current Intel driver implementation, the EXE type means that the GPU has 
> started working on the command sequence. This information is useless to 
> any waiter except waiters wanting to idle batch buffers.

>  When the RW 
> flag has signaled, it means that all commands have finished execution 
> and caches are flushed. 

Conflating these two issues means that cannot distinguish between a GPU
visible flush and a CPU visible flush. Most of the time, a GPU visible
flush is sufficient, and as we discussed earlier, it isn't sufficient to
have flushes inserted into the command stream from user space in the
case of shared buffers. Hence, the kernel must expose a separate flush
mechanism which can insert the appropriate flush operations into the
command stream, mark buffers as flushed and, potentially, wait for EXE
fences to expire when the buffer contents are needed by the CPU.

> Some of these flags may not signal automatically. This can have 
> different reasons. One being that IRQs are turned off, since there are 
> no waiters, One that there are no hardware IRQs, One that the GPU needs 
> some kind of flushing. Therefore the concept of fence flushing was 
> invented. Fence flushing is NOT equal to GPU flushing. Rather, fence 
> flushing means "There are waiters for this fence_type, please make sure 
> to update it and make sure it will eventually signal."

By exposing this beyond the driver, you've made it effectively
impossible for drivers to avoid polling though. If you make the 'wait'
operation driver-specific, then all of this complexity can be removed
from the general code and applications see simple 'wait' operations. I
could easily see providing multiple versions of the 'wait' function in
the driver-independent code, with a regular blocking one used by
hardware that didn't require polling.

> If we've created a fence object we don't necessarily have to wait on it 
> to signal completely. Each buffer object carries with it per-buffer 
> information about which fence flags to wait on to idle the buffer. Also 
> a fence object can be completely removed before it is completely 
> signaled if there is nothing referencing it.

By making fence objects signal only information about the EXE status,
there isn't any notion of a 'complete' fence signal.

> Nothing says we need to implement Intel RW GPU flushing as a fence 
> flushing operation, In fact, we can restructure the Intel driver to only 
> use EXE fencing in all situations and remove fence-induced GPU flushing, 
> but determining buffer idle in all situations listed above would become 
> very awkward.

Buffer idle is easy to know -- if the buffer has not been accessed since
the last flush operation known to have executed, then the buffer is
idle. This can be encoded in a separate 'flush' operation which would
cause bufers to become idle WRT the CPU/GPU as necessary. Those
operations may (in a driver-dependent fashion) use EXE fences, but they
certainly wouldn't have to, other than checking to make sure the buffer
wasn't queued for GPU access.

> Also, if we assume that the Intel docs are correct about read flushing, 
> or the Intel hw engineers in the future decides to really implement 
> out-of-order GPU command execution, read flushing would be a very common 
> operation for texture buffer mapping or user-space texture memory 
> management.

This would only happen if the GPU also started to include some barrier
commands to ensure memory access serialization. If they only provide
'MI_FLUSH' as a barrier, we'll go hit them with sticks until they fix
it.

> See the above. Delayed delete is not used by fencing. Rather it uses 
> fences to determine when buffers are idle.

Yup, it just wants to 'flush' the buffer from the GPU.

> In order to do that it must be able to emit fence flushes, which may or 
> may not correspond to GPU flushes.

From what I can understand, these 'fence flushes' are really the
mechanism used to poll the driver for fence completion in the absence of
IRQ-driven fencing.

> Anyway, the inline access-mode changing flush seems to be the source of 
> most confusion.

in-line access-mode changes are critical to 2D driver performance, so
this is rather interesting to me.

> Any implementation of this would need a driver-specific list of pending 
> flushes in the ring to allow for asynchronous operation.

I don't know why this would be driver-specific -- the list of pending
accesses is needed for any kind of flushing to work correctly. The
driver-independent code would monitor the access modes as they are
queued to the ring. Given the previous access, and the current access,
the driver would be able to compute the correct flush sequence necessary
to switch.

For non in-line polling drivers, that might involve waiting for the GPU
to go idle, flushing various caches by hand, restarting the GPU and
posting the subsequent operation. The driver-specific portion is not in
monitoring the access modes to various objects, rather it is strictly in
ensuring that access changes evoke the correct flush operation during
transition.

>  At least if we 
> want to track flush execution and not just flush submission.

Right, I need to track flush execution when transitioning from GPU to
CPU access. To do this, you would emit an in-line flush sequence,
updating state as appropriate, then emit a fence and block waiting for
it to pass.

> Each list entry would carry the flush type and the corresponding 
> breadcrumb. When that breadcrumb is written the implementation would 
> know that a flush has executed, and would with the current fence 
> implementation walk the fence list and update all active fence objects.

It's not the fence objects which are interesting here -- it's the
flushed state of all buffer objects in the system. Any buffer which has
been accessed since the last CPU-synchronization flush would need to be
marked as 'idle' wrt the CPU. There wouldn't necessarily be a fence yet
pending on this status; the flush operation affects all buffers, not
just those being moved from GPU to CPU.

> 
> The member fence->native_type was initially intended for this 
> functionality, but because a fence object may be deleted before it signals,
> flush information may be lost. Better to use a driver-specific list and 
> remove that member.
> 
> So to correctly decide whether to emit an in-ring flush, avoid any 
> waiting, and to avoid unnecessary flushes we need the following information.
> 
> 1) The old access mode of the buffer.
> The current implementation uses bo->fence_type for this. This is per 
> buffer and not the same as bo->fence->type.

Ah, but it's not really 'fence_type' then -- it's the access mode from
the previous execbuffer ioctl call.

> 2) The new access mode of the buffer.
> Immediate information.
> 3) Whether a flush affecting this buffer is pending in the ring.
> Determined from bo->fence->sequence and the breadcrumb of the last flush 
> submitted to the ring.
> 4) Whether a flush affecting this buffer has already executed.
> Determined from bo->fence->signaled, which is asynchronously updated 
> when a flush has executed.

That seems to assume that every fence would include a flush operation,
something we're working hard to avoid. Otherwise, the buffer object may
be associated with a non-flushing fence, and a flush operation later on
in the command sequence would not mark this buffer as flushed.

Again, conflating fences with flushing doesn't make sense -- fences need
not flush, and flushes need not fence.

-- 
[EMAIL PROTECTED]

signature.asc
Description: This is a digitally signed message part

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Buffer object access mode is per-operation, not per-buffer

Reply via email to