Re: Buffer object access mode is per-operation, not per-buffer

Thomas Hellström Tue, 25 Dec 2007 05:25:12 -0800

Keith Packard wrote:
> On Sat, 2007-12-22 at 23:40 +0100, Thomas Hellström wrote:
>
>   
>> DRM_I915_FENCE_TYPE_READ | DRM_I915_FENCE_TYPE_WRITE | DRM_FENCE_TYPE_EXE
>>     
>
> The union of all access modes is always read|write|exe (read from vbos,
> write to back buffer, execute from batch buffer). Hence, these bits
> cannot carry any novel information as every fence looks exactly alike.
> Having extra bits in the fence which reflect the access mode is thus not
> useful in any driver.
>   
The fence type bits are solely intended to indicate different completion 
stages, so if the union of access modes defined as (RWX) is the
same for all fence objects of a particular driver, you're right. Fence 
flushing of a particular fence type is intented to make sure that that
fence type will eventually signal. This functionality (which may seem 
awkward an unnecessary at first) can be used to turn on / off irqs in 
the presence / absence of waiters and can be used to perform things like 
RW flushing on Intel chipsets.
Other uses I foresee is, (again with Unichrome as an example), marking a 
buffer for access with the mpeg- or the HQV engine. These use the same 
command submission mechanism, but have separate IRQs or status bit to 
indicate engine idle.
Here, it is important that these fence objects are not removed from the 
fence active ring before the relevant status bits are checked.
> What we need is a way to know what kind of flushing is needed before
> queueing a batch buffer, or moving a buffer between memory types.
>
> On Intel, we have three kinds of flushing:
>
>      1. Flush CPU cache and GWB. Used when moving buffers from main
>         memory to the GTT.
>      2. Flush GTT cache. Used when switching buffers from read to write
>         mode inside the GPU.
>      3. Flush GTT cache and wait for the GPU to process the flush. Used
>         when moving memory out of the GTT.
>
> Only type 3. requires any kind of synchronization with the GPU, and type
> 3. is also the least common (assuming we aren't thrashing textures).
> Therefore, we should certainly optimize for cases 1. and 2.
>
> Towards this end, I believe that flushing should be separated from
> fencing. For Intel, case 1. and 2. are performed effectively immediately
> (case 1. is a sequence of register writes which are synchronous, case 2.
> requires appending an MI_FLUSH instruction to the ring). Case 3.
> requires waiting for a EXE fence to pass.
>
> So, I suggest that the driver be given a flush entry point which
> synchronously waits for the operation to complete. On intel, case 3.
> will require constructing a fence and waiting for that, but the other
> two cases can be handled without any pausing.
>   
> Hardware which requires polling to complete their flushing operations
> can perform polling within a simple loop that doesn't involve any
> fencing at all.
>
>   
It's important here that the delayed deletion of buffers are considered.
The code is currently using the fence flush mechanism to avoid
busy-waiting also in the case where only polling is available.
Uninterruptible busy-waiting in the kernel is generally something you 
don't want.
At the very least the code needs to schedule() after a small number of 
polls and return to
user-space on pending signals.
> Given two different kinds of fence waiting (polling vs interrupts), it
> therefore seems sensible to push the implementation of this operation
> into the driver as well; we can provide DRM macros to abstract the OS
> wait interface to keep this code OS-independent.
>   
Yes, we can provide a driver-specific optimized fence_wait method, 
perhaps with
a default generic one. However, I do believe we still need to be able to 
do fence_flush and fence_signaled
considering the delayed delete mechanism. We probably also need to 
provide example code for the
pure IRQ and pure polling case.
>> It's will not require any synchronization with the extension mentioned. 
>> Rather it's a restriction that a single buffer cannot be accessed in 
>> both read and write mode by the same superioctl command submission. 
>> There must be a call to drm_bo_do_validate() in between to switch access 
>> mode on the buffer object and emit the MI_FLUSH operation, and that call 
>> needs to be made using a new superioctl call. I'm not sure how else you 
>> would tell the kernel that we are switching buffer access modes?
>>     
>
> The question is where this information lies. The fence contains no
> information about the access modes for each buffer, so that information
> must be saved inside the buffer object. However, when an MI_FLUSH
> operation is placed in the ring, *all* buffers get flushed at that
> point, so we must update the flush pending status on all buffers at that
> point.
>   
I can't really see why?
We should be able to determine whether *any* operation has been queued 
or completed from the current fence implementation, and the fence driver 
additions, if not, something is indeed wrong.
However, keeping track of all buffers with a certain pending flush 
status seems very awkward to me, so if the buffer carries a reference to 
the completion information together with it's access mode we only need 
to update the completion information which is not per buffer:


a) The operation in question. (which would be the fence object).
b) The completion status of any flushes needed to complete the operation 
(which would be the fence object signaled flags).
c) Whether there are any flushes pending in the command queue that would 
perform a particular flush operation on a fence sequence number of 
interest. (This wasn't needed with the sync flush).
d) The knowledge that any flush performed will update the signaled flags 
of all preceding fence objects. (This is build into the current 
implementation).

Now, c) isn't implemented and you would need it for flush type 2) above. 
However it would be easy just to keep the breacrumb sequence number of 
the fence object corresponding to the last submitted sequence flush in 
the driver-private data structure, remembering that a non-signaled fence 
object always has a valid sequence number, and the global flush sequence 
numbers can be marked invalid when they are equal to the current 
breadcrumb sequence number. Care must be taken to avoid any wraparound 
issues.

> I don't believe this is equivalent to fencing -- most of the time,
> there's no need for the process to wait for the flush to occur, only to
> know that it has been queued. Of course, if you want to pull the object
> from the GTT, you'll have to wait for that flush to execute, which means
> knowing which buffers are flushed at each breadcrumb. Which could use
> the fencing mechanism, for in-ring flush operations.
>   
But the code wouldn't wait for the flush to occur.

If we take a look, again, at example 2 above and let's say we need to 
change status from write-to read of three buffers a, b and c during the 
validation sequence of a superioctl submission:

1) The validation code inspects the per buffer access mode information 
bo->fence_type of buffer a) and determines that the WRITE flag is going 
to be cleared on that buffer. Hence an in-ring write flush might be needed.
2) The validation code inspects the fence object pointed to by the 
buffer. The WRITE flush type has not signaled.
3) The validation code emits an MI_FLUSH, updates the breadcrumb 
counter, updates the last_write_flush sequence to the current breadcrumb 
number and then emits a new fence object, f_new.
4) The validation code dereferences the old fence object pointed to by 
buffer a), f_old.
5) The validation code points the buffer to the new fence object, f_new, 
and places the buffer object on the unfenced list. The new per buffer 
access mode information bo->new_fence_type is set  to contain the READ flag.
6) The validation code moves on to buffer b). and determines that the 
WRITE flag is going to be cleared on that buffer. Hence an in-ring write 
flush might be needed.
7) Buffer b) still has a pointer to f_old, The write flush type has 
still not signaled.
8) The validation code determines that the sequence number of f_old is 
less or equal to the last_write_flush sequence, and that the 
last_write_flush sequence is valid and therefore doesn't emit a flush 
and puts buffer b) on the unfenced list, The new per buffer access mode 
information bo->new_fence_type is set  to contain the READ flag.
9) The validation code moves on to buffer c). and determines that the 
write flag is going to be cleared on that buffer. Hence an in-ring write 
flush is needed.
10) Buffer c) still has a pointer to f_old, But now the in-ring flush 
emitted for buffer a) has executed and also f_old has been updated by 
the fence IRQ handler so that the WRITE type (and perhaps even the READ 
type) has signaled. The buffer's fence pointer is set to NULL and f_old 
is dereferenced. Buffer c) is put on the unfenced list and the new per 
buffer access mode information bo->new_fence_flags is set  to contain 
the READ flag.
11a) Commands are put on the ring, a new fence object f_final is 
created, and buffer objects are updated to point to the new fence 
object, without waiting for any old fence objects to signal. Per buffer 
access information bo->fence_flags := bo->new_fence_flags, Old fence 
objects are dereferenced as appropriate, and buffers are moved off the 
unfenced list.
11b) (Error scenario) Command submission is interrupted by a mouse-move 
signal, returning an -EAGAIN. Buffers are moved off the unfenced list, 
retaining their old per buffer access mode information bo->fence_flags, 
and their old fence objects pointers. a) points to f_new, b) points to 
f_old and c) has a NULL fence object pointer. The caller is required to 
re-run the superioctl call.

So while I can see the need for a new driver entry point when the old 
and new access modes are incompatible
(currently the code idles the buffer here), I still fail to see the need 
for any additional mechanism as it feels natural to otherwise only 
submit flushes on-demand (from the fence waiting code), and make sure 
that all flush completion information is properly propagated to all 
possible users of that information.
I also can't see the need for additional per-buffer flush pending 
information or per-flush-type buffer lists, as the fence code already 
updates all active fence objects with any relevant flush information.
Could you give a detailed example when the current code is less 
efficient or fails to do what you want here?

Perhaps it's also worth considering doing flush example 2) from user-space?
If user-space detects that a buffer is changing mode, just emit an 
MI_FLUSH to the command stream.
This would, however, also require that the superioctl code emits an 
MI_FLUSH when it switches to a new client.

Regards,
Thomas






















-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Buffer object access mode is per-operation, not per-buffer

Reply via email to