Re: Buffer object access mode is per-operation, not per-buffer

Thomas Hellström Wed, 26 Dec 2007 05:51:56 -0800

Keith Packard wrote:
> On Tue, 2007-12-25 at 14:24 +0100, Thomas Hellström wrote:
>   
>   
>> The fence type bits are solely intended to indicate different completion 
>> stages, so if the union of access modes defined as (RWX) is the
>> same for all fence objects of a particular driver, you're right. Fence 
>> flushing of a particular fence type is intented to make sure that that
>> fence type will eventually signal. This functionality (which may seem 
>> awkward an unnecessary at first) can be used to turn on / off irqs in 
>> the presence / absence of waiters and can be used to perform things like 
>> RW flushing on Intel chipsets.
>>     
>
> RW flushing is independent of fencing for most use cases -- when
> switching between reading and writing operations in the ring, there's no
> need to fence the operation, only to place a flush in the command
> stream. The only time when flushing is related to fencing is when you
> need to wait for a fence to expire with a flush operation in front of
> it.
>
> Mixing flushing and fencing in the same mechanism has made that very
> difficult to understand, and I believe it has removed the common
> optimization where flushing needn't wait on a fence.
>
>   
It seems that the fact that we don't agree on this is that we have 
different views of what a fence is and that we're approaching the same 
problem from different directions.


Let's start with the fence concept in the buffer manager code. A fence 
object represents a sequence of GPU commands, and carries information 
about the completion- and error status of this sequence of commands. 
Each fence_type flag is a step in this completion status, and a fence is 
not completely signaled until all fence_type flags have signaled.

The meaning of the fence_type flags are driver-dependent. With the 
current Intel driver implementation, the EXE type means that the GPU has 
started working on the command sequence. This information is useless to 
any waiter except waiters wanting to idle batch buffers. When the RW 
flag has signaled, it means that all commands have finished execution 
and caches are flushed. This is useful for buffer eviction, buffer 
mapping, buffer deletion and user-space sub-allocator buffer reuse. 
Other drivers may have yet other fence_type flags. Poulsbo, for example 
has flags indicating "register value-pairs written", "binner done", 
"rasterizer done", "binner memory deallocated", "GPU queries answered".

Some of these flags may not signal automatically. This can have 
different reasons. One being that IRQs are turned off, since there are 
no waiters, One that there are no hardware IRQs, One that the GPU needs 
some kind of flushing. Therefore the concept of fence flushing was 
invented. Fence flushing is NOT equal to GPU flushing. Rather, fence 
flushing means "There are waiters for this fence_type, please make sure 
to update it and make sure it will eventually signal."

If we've created a fence object we don't necessarily have to wait on it 
to signal completely. Each buffer object carries with it per-buffer 
information about which fence flags to wait on to idle the buffer. Also 
a fence object can be completely removed before it is completely 
signaled if there is nothing referencing it.

Nothing says we need to implement Intel RW GPU flushing as a fence 
flushing operation, In fact, we can restructure the Intel driver to only 
use EXE fencing in all situations and remove fence-induced GPU flushing, 
but determining buffer idle in all situations listed above would become 
very awkward.

Also, if we assume that the Intel docs are correct about read flushing, 
or the Intel hw engineers in the future decides to really implement 
out-of-order GPU command execution, read flushing would be a very common 
operation for texture buffer mapping or user-space texture memory 
management.

>
> I'd say a simpler mechanism would be to use a helper thread that polls
> when such delayed work needs to be done without having an obvious
> process context available. It seems like you're counting on having some
> process polling the hardware to clean things up, but we need to deal
> with the idle process cases where we want to release resources
> reasonably quickly instead of waiting for that process to wakeup and
> process more requests.
>   
Nope. Delayed delete is handled by a kernel worker thread.

> Please consider removing the delayed delete mechanism from fencing --
> they are completely unrelated, and a polling driver should be using a
> worker thread to clean up instead of hoping that some other process will
> come along and poll so that objects may be freed.
>
> Fencing should be restricted to knowing when the GPU has completed a
> sequence of commands.
>
>   
See the above. Delayed delete is not used by fencing. Rather it uses 
fences to determine when buffers are idle.
In order to do that it must be able to emit fence flushes, which may or 
may not correspond to GPU flushes.

Anyway, the inline access-mode changing flush seems to be the source of 
most confusion.

Any implementation of this would need a driver-specific list of pending 
flushes in the ring to allow for asynchronous operation. At least if we 
want to track flush execution and not just flush submission.
Each list entry would carry the flush type and the corresponding 
breadcrumb. When that breadcrumb is written the implementation would 
know that a flush has executed, and would with the current fence 
implementation walk the fence list and update all active fence objects.

The member fence->native_type was initially intended for this 
functionality, but because a fence object may be deleted before it signals,
flush information may be lost. Better to use a driver-specific list and 
remove that member.

So to correctly decide whether to emit an in-ring flush, avoid any 
waiting, and to avoid unnecessary flushes we need the following information.

1) The old access mode of the buffer.
The current implementation uses bo->fence_type for this. This is per 
buffer and not the same as bo->fence->type.
2) The new access mode of the buffer.
Immediate information.
3) Whether a flush affecting this buffer is pending in the ring.
Determined from bo->fence->sequence and the breadcrumb of the last flush 
submitted to the ring.
4) Whether a flush affecting this buffer has already executed.
Determined from bo->fence->signaled, which is asynchronously updated 
when a flush has executed.

If !(3 or 4), then go ahead and emit a flush.
This requires a new driver entry point and while it uses fence 
information, it is not in any way related to fence waiting.
(When the flush finally executes it will asynchronously update all 
active fence objects and update the flush pending list,
provided that IRQs are enabled).

No unnecessary flushes, no buffer lists, no unnecessary waits, no buffer 
flush status flags, no lost information,
obtain information only when needed.

/Thomas




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Buffer object access mode is per-operation, not per-buffer

Reply via email to