On Sat, 2007-12-22 at 23:40 +0100, Thomas Hellström wrote:

> DRM_I915_FENCE_TYPE_READ | DRM_I915_FENCE_TYPE_WRITE | DRM_FENCE_TYPE_EXE

The union of all access modes is always read|write|exe (read from vbos,
write to back buffer, execute from batch buffer). Hence, these bits
cannot carry any novel information as every fence looks exactly alike.
Having extra bits in the fence which reflect the access mode is thus not
useful in any driver.

What we need is a way to know what kind of flushing is needed before
queueing a batch buffer, or moving a buffer between memory types.

On Intel, we have three kinds of flushing:

     1. Flush CPU cache and GWB. Used when moving buffers from main
        memory to the GTT.
     2. Flush GTT cache. Used when switching buffers from read to write
        mode inside the GPU.
     3. Flush GTT cache and wait for the GPU to process the flush. Used
        when moving memory out of the GTT.

Only type 3. requires any kind of synchronization with the GPU, and type
3. is also the least common (assuming we aren't thrashing textures).
Therefore, we should certainly optimize for cases 1. and 2.

Towards this end, I believe that flushing should be separated from
fencing. For Intel, case 1. and 2. are performed effectively immediately
(case 1. is a sequence of register writes which are synchronous, case 2.
requires appending an MI_FLUSH instruction to the ring). Case 3.
requires waiting for a EXE fence to pass.

So, I suggest that the driver be given a flush entry point which
synchronously waits for the operation to complete. On intel, case 3.
will require constructing a fence and waiting for that, but the other
two cases can be handled without any pausing.

Hardware which requires polling to complete their flushing operations
can perform polling within a simple loop that doesn't involve any
fencing at all.

Hardware which also requires polling to detect fence completion (which
is lame, but I'm sure there is such hardware out there) would implement
their fence waits with polling.

Given two different kinds of fence waiting (polling vs interrupts), it
therefore seems sensible to push the implementation of this operation
into the driver as well; we can provide DRM macros to abstract the OS
wait interface to keep this code OS-independent.

> It's will not require any synchronization with the extension mentioned. 
> Rather it's a restriction that a single buffer cannot be accessed in 
> both read and write mode by the same superioctl command submission. 
> There must be a call to drm_bo_do_validate() in between to switch access 
> mode on the buffer object and emit the MI_FLUSH operation, and that call 
> needs to be made using a new superioctl call. I'm not sure how else you 
> would tell the kernel that we are switching buffer access modes?

The question is where this information lies. The fence contains no
information about the access modes for each buffer, so that information
must be saved inside the buffer object. However, when an MI_FLUSH
operation is placed in the ring, *all* buffers get flushed at that
point, so we must update the flush pending status on all buffers at that
point.

I don't believe this is equivalent to fencing -- most of the time,
there's no need for the process to wait for the flush to occur, only to
know that it has been queued. Of course, if you want to pull the object
from the GTT, you'll have to wait for that flush to execute, which means
knowing which buffers are flushed at each breadcrumb. Which could use
the fencing mechanism, for in-ring flush operations.

Here's what I imagine wanting:

      * Flushing CPU caches and GWB is a synchronous operation, no
        fencing or waiting is required. We already have most of this as
        bo_driver->ttm_cache_flush, but it's missing the GWB flush at
        present (which is handled separately inside the execbuffer
        ioctl).
      * Adding a flush operation to the ring. This would mark all
        buffers which have been accessed since the last flush with a new
        breadcrumb. Have this generate an IRQ so that buffers can be
        updated as this breadcrumb passes, and so that applications can
        wait for buffers to become flushed.

For chips not supporting in-ring flush operations, the 'add a flush to
the ring' would presumably be 'start a flush operation' and 'wait for
flush to complete' would be 'poll for the flush operation to complete'.

Separating flushing from the execution breadcrumb would eliminate the
current complexity of fencing, and also provide for a clear separation
between flushing, which usually requires no delay, and fencing, which
always does.

-- 
[EMAIL PROTECTED]

Attachment: signature.asc
Description: This is a digitally signed message part

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to