Re: Buffer object access mode is per-operation, not per-buffer

Thomas Hellström Thu, 27 Dec 2007 01:35:01 -0800

Keith Packard wrote:

>
>In particular, I think that adding what amounts to a state machine to
>the fencing logic is a bad idea -- if the driver needs a state machine
>to complete a particular fence, the driver should implement that state
>machine internally. If the driver has no IRQs to drive that state
>machine, it should, internally, use other kernel mechanisms to perform
>polling. Conflating the simple fence mechanism with driver-specific
>state machine operation adds complexity to every driver, instead of only
>to drivers which need these additional steps.
>
>I would see each fence having a *single* type, which lets the driver
>know how to signal it correctly. The driver could attach additional
>information about the fence outside of the core code, allowing it to
>sequence the fence through the necessary stages before it becomes
>signalled.
>
>Then, fence users would block waiting for the fence to become signalled,
>the sequence of operations leading to that state transition would be
>hidden from the fence users and the driver would be able to perform
>whatever code was necessary behind the application's back.
>
>  
>
This would not be sufficient to optimize the (re-)use of buffers.
Exec buffers should be released when the EXE type has signalled,
For Poulsbo, vertex-, index- and command stream buffers can be released 
when the TA (aka BINNER) is done whereas texture buffers only when the 
RASTERIZER done type singnals.
All for the same sequence of commands. So if we expose these stages, we 
also must expose a way for the waiting code to make sure that they 
eventually signal.


Let's say you want to implement flushing separately from the fencing 
code and expose only a single EXE type, the current generic code indeed 
allows you to do that.

>By exposing this beyond the driver, you've made it effectively
>impossible for drivers to avoid polling though. If you make the 'wait'
>operation driver-specific, then all of this complexity can be removed
>from the general code and applications see simple 'wait' operations. I
>could easily see providing multiple versions of the 'wait' function in
>the driver-independent code, with a regular blocking one used by
>hardware that didn't require polling.
>  
>
Right. I agree a driver-specific wait is a good idea.
However we must be careful about how the word "polling" is used.
A fence flush may well be used to turn on IRQ flags in the presence of 
waiters.

>>Nothing says we need to implement Intel RW GPU flushing as a fence 
>>flushing operation, In fact, we can restructure the Intel driver to only 
>>use EXE fencing in all situations and remove fence-induced GPU flushing, 
>>but determining buffer idle in all situations listed above would become 
>>very awkward.
>>    
>>
>
>Buffer idle is easy to know -- if the buffer has not been accessed since
>the last flush operation known to have executed, then the buffer is
>idle. This can be encoded in a separate 'flush' operation which would
>cause bufers to become idle WRT the CPU/GPU as necessary. Those
>operations may (in a driver-dependent fashion) use EXE fences, but they
>certainly wouldn't have to, other than checking to make sure the buffer
>wasn't queued for GPU access.
>  
>
And how would a user-space sub-allocator distinguish between EXE buffer 
idle and
Texture buffer idle for the same command sequence?

>
>From what I can understand, these 'fence flushes' are really the
>mechanism used to poll the driver for fence completion in the absence of
>IRQ-driven fencing.
>
>  
>
Yes, or If the IRQs are turned off because there are no waiters.
But this is not an expensive polling. A waiter would do a single poll 
and then wait.

>in-line access-mode changes are critical to 2D driver performance, so
>this is rather interesting to me.
>
>  
>
So, we've agreed on
1) Driver specific waits is a good thing. Let's implement them.
2) We need a new driver entry point to emit in-line flushes at validate 
time. I would like the generic code to call this one when the validate 
code detects that previous and new access mode flags are incompatible.

My current opinion is that

    * Fence types are needed to expose completion steps to be able to
      reuse buffers without excessive waits.
    * If a driver writer doesn't care about this. All fencing can be
      implemented with an EXE type only.
    * The current implementation with the extensions outlined above and
      in previous mails would be able to do what you want in a close-to
      optimal way.
    * I want to avoid core flush-specific buffer object lists as I
      believe they duplicate information easily obtainable by other means.
    * I want to avoid core pending flush lists at this point as most
      current hardware doesn't need to use them. I believe currently
      Intel and Radeon would be the only ones. Poulsbo has very small
      (16x16pixels) rendering caches since it's tile based, so it
      doesn't implement any separate flushing, and probably wouldn't
      benefit from it anyway. Unichromes and SiS apparently flush when
      switching from 3D to 2D, and that's done when a breadcrumb write
      is executed.
    * GPU flushing need not be tied to the fencing mechanism, but there
      must be a way (exposed or not) for a fence waiter to initiate a
      flush which hasn't yet been queued.

Are there any other issues I fail to see here such as problems flushing 
at wait time due to ring locking issues etc?

Otherwise either I fail to see a crucial detail, or you fail to see the 
capabilities of the current code.

The by far easiest way to remedy both those situations is if you could 
provide a detailed example (in pseudo-code or whatever) where you feel 
the suggested solution fails to do what you want or would issue an 
unnecessary flush (such as a flush that is not issued on-demand or 
duplicated) or  an unnecessary wait?

If, on the other hand we agree that both solutions are comparable in 
terms of efficiency and capabilities we should probably start listing 
important pros and cons, otherwise we could continue this discussion 
forever....

/Thomas





-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Buffer object access mode is per-operation, not per-buffer

Reply via email to