On Sat, 2007-12-22 at 23:40 +0100, Thomas Hellström wrote: > DRM_I915_FENCE_TYPE_READ | DRM_I915_FENCE_TYPE_WRITE | DRM_FENCE_TYPE_EXE
The union of all access modes is always read|write|exe (read from vbos, write to back buffer, execute from batch buffer). Hence, these bits cannot carry any novel information as every fence looks exactly alike. Having extra bits in the fence which reflect the access mode is thus not useful in any driver. What we need is a way to know what kind of flushing is needed before queueing a batch buffer, or moving a buffer between memory types. On Intel, we have three kinds of flushing: 1. Flush CPU cache and GWB. Used when moving buffers from main memory to the GTT. 2. Flush GTT cache. Used when switching buffers from read to write mode inside the GPU. 3. Flush GTT cache and wait for the GPU to process the flush. Used when moving memory out of the GTT. Only type 3. requires any kind of synchronization with the GPU, and type 3. is also the least common (assuming we aren't thrashing textures). Therefore, we should certainly optimize for cases 1. and 2. Towards this end, I believe that flushing should be separated from fencing. For Intel, case 1. and 2. are performed effectively immediately (case 1. is a sequence of register writes which are synchronous, case 2. requires appending an MI_FLUSH instruction to the ring). Case 3. requires waiting for a EXE fence to pass. So, I suggest that the driver be given a flush entry point which synchronously waits for the operation to complete. On intel, case 3. will require constructing a fence and waiting for that, but the other two cases can be handled without any pausing. Hardware which requires polling to complete their flushing operations can perform polling within a simple loop that doesn't involve any fencing at all. Hardware which also requires polling to detect fence completion (which is lame, but I'm sure there is such hardware out there) would implement their fence waits with polling. Given two different kinds of fence waiting (polling vs interrupts), it therefore seems sensible to push the implementation of this operation into the driver as well; we can provide DRM macros to abstract the OS wait interface to keep this code OS-independent. > It's will not require any synchronization with the extension mentioned. > Rather it's a restriction that a single buffer cannot be accessed in > both read and write mode by the same superioctl command submission. > There must be a call to drm_bo_do_validate() in between to switch access > mode on the buffer object and emit the MI_FLUSH operation, and that call > needs to be made using a new superioctl call. I'm not sure how else you > would tell the kernel that we are switching buffer access modes? The question is where this information lies. The fence contains no information about the access modes for each buffer, so that information must be saved inside the buffer object. However, when an MI_FLUSH operation is placed in the ring, *all* buffers get flushed at that point, so we must update the flush pending status on all buffers at that point. I don't believe this is equivalent to fencing -- most of the time, there's no need for the process to wait for the flush to occur, only to know that it has been queued. Of course, if you want to pull the object from the GTT, you'll have to wait for that flush to execute, which means knowing which buffers are flushed at each breadcrumb. Which could use the fencing mechanism, for in-ring flush operations. Here's what I imagine wanting: * Flushing CPU caches and GWB is a synchronous operation, no fencing or waiting is required. We already have most of this as bo_driver->ttm_cache_flush, but it's missing the GWB flush at present (which is handled separately inside the execbuffer ioctl). * Adding a flush operation to the ring. This would mark all buffers which have been accessed since the last flush with a new breadcrumb. Have this generate an IRQ so that buffers can be updated as this breadcrumb passes, and so that applications can wait for buffers to become flushed. For chips not supporting in-ring flush operations, the 'add a flush to the ring' would presumably be 'start a flush operation' and 'wait for flush to complete' would be 'poll for the flush operation to complete'. Separating flushing from the execution breadcrumb would eliminate the current complexity of fencing, and also provide for a clear separation between flushing, which usually requires no delay, and fencing, which always does. -- [EMAIL PROTECTED]
signature.asc
Description: This is a digitally signed message part
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
-- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel