Keith Packard wrote: > On Tue, 2007-12-25 at 14:24 +0100, Thomas Hellström wrote: > > >> The fence type bits are solely intended to indicate different completion >> stages, so if the union of access modes defined as (RWX) is the >> same for all fence objects of a particular driver, you're right. Fence >> flushing of a particular fence type is intented to make sure that that >> fence type will eventually signal. This functionality (which may seem >> awkward an unnecessary at first) can be used to turn on / off irqs in >> the presence / absence of waiters and can be used to perform things like >> RW flushing on Intel chipsets. >> > > RW flushing is independent of fencing for most use cases -- when > switching between reading and writing operations in the ring, there's no > need to fence the operation, only to place a flush in the command > stream. The only time when flushing is related to fencing is when you > need to wait for a fence to expire with a flush operation in front of > it. > > Mixing flushing and fencing in the same mechanism has made that very > difficult to understand, and I believe it has removed the common > optimization where flushing needn't wait on a fence. > > It seems that the fact that we don't agree on this is that we have different views of what a fence is and that we're approaching the same problem from different directions.
Let's start with the fence concept in the buffer manager code. A fence object represents a sequence of GPU commands, and carries information about the completion- and error status of this sequence of commands. Each fence_type flag is a step in this completion status, and a fence is not completely signaled until all fence_type flags have signaled. The meaning of the fence_type flags are driver-dependent. With the current Intel driver implementation, the EXE type means that the GPU has started working on the command sequence. This information is useless to any waiter except waiters wanting to idle batch buffers. When the RW flag has signaled, it means that all commands have finished execution and caches are flushed. This is useful for buffer eviction, buffer mapping, buffer deletion and user-space sub-allocator buffer reuse. Other drivers may have yet other fence_type flags. Poulsbo, for example has flags indicating "register value-pairs written", "binner done", "rasterizer done", "binner memory deallocated", "GPU queries answered". Some of these flags may not signal automatically. This can have different reasons. One being that IRQs are turned off, since there are no waiters, One that there are no hardware IRQs, One that the GPU needs some kind of flushing. Therefore the concept of fence flushing was invented. Fence flushing is NOT equal to GPU flushing. Rather, fence flushing means "There are waiters for this fence_type, please make sure to update it and make sure it will eventually signal." If we've created a fence object we don't necessarily have to wait on it to signal completely. Each buffer object carries with it per-buffer information about which fence flags to wait on to idle the buffer. Also a fence object can be completely removed before it is completely signaled if there is nothing referencing it. Nothing says we need to implement Intel RW GPU flushing as a fence flushing operation, In fact, we can restructure the Intel driver to only use EXE fencing in all situations and remove fence-induced GPU flushing, but determining buffer idle in all situations listed above would become very awkward. Also, if we assume that the Intel docs are correct about read flushing, or the Intel hw engineers in the future decides to really implement out-of-order GPU command execution, read flushing would be a very common operation for texture buffer mapping or user-space texture memory management. > > I'd say a simpler mechanism would be to use a helper thread that polls > when such delayed work needs to be done without having an obvious > process context available. It seems like you're counting on having some > process polling the hardware to clean things up, but we need to deal > with the idle process cases where we want to release resources > reasonably quickly instead of waiting for that process to wakeup and > process more requests. > Nope. Delayed delete is handled by a kernel worker thread. > Please consider removing the delayed delete mechanism from fencing -- > they are completely unrelated, and a polling driver should be using a > worker thread to clean up instead of hoping that some other process will > come along and poll so that objects may be freed. > > Fencing should be restricted to knowing when the GPU has completed a > sequence of commands. > > See the above. Delayed delete is not used by fencing. Rather it uses fences to determine when buffers are idle. In order to do that it must be able to emit fence flushes, which may or may not correspond to GPU flushes. Anyway, the inline access-mode changing flush seems to be the source of most confusion. Any implementation of this would need a driver-specific list of pending flushes in the ring to allow for asynchronous operation. At least if we want to track flush execution and not just flush submission. Each list entry would carry the flush type and the corresponding breadcrumb. When that breadcrumb is written the implementation would know that a flush has executed, and would with the current fence implementation walk the fence list and update all active fence objects. The member fence->native_type was initially intended for this functionality, but because a fence object may be deleted before it signals, flush information may be lost. Better to use a driver-specific list and remove that member. So to correctly decide whether to emit an in-ring flush, avoid any waiting, and to avoid unnecessary flushes we need the following information. 1) The old access mode of the buffer. The current implementation uses bo->fence_type for this. This is per buffer and not the same as bo->fence->type. 2) The new access mode of the buffer. Immediate information. 3) Whether a flush affecting this buffer is pending in the ring. Determined from bo->fence->sequence and the breadcrumb of the last flush submitted to the ring. 4) Whether a flush affecting this buffer has already executed. Determined from bo->fence->signaled, which is asynchronously updated when a flush has executed. If !(3 or 4), then go ahead and emit a flush. This requires a new driver entry point and while it uses fence information, it is not in any way related to fence waiting. (When the flush finally executes it will asynchronously update all active fence objects and update the flush pending list, provided that IRQs are enabled). No unnecessary flushes, no buffer lists, no unnecessary waits, no buffer flush status flags, no lost information, obtain information only when needed. /Thomas ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel