On Fri, 2007-12-21 at 13:21 +0100, Thomas Hellström wrote:

> Keith,
> This is purely an optimization.
> Buffers that are evicted due to a memory aperture space shortage ends up 
> in a
> state where they are unbound from the aperture, but retain the uncached 
> state assuming the next operation on them will be "validate".

Ok, so we have several primitive operations here and several useful
combinations. I think we just need some more consistent naming and
documentation to reflect which function performs which primitive
operations, and what the intended use cases are. Right now, the function
names are not descriptive to me, and there is certainly a lack of
documentation in the code.

> In that case they are just bound back. However, if they are instead 
> accessed by the cpu, the caching state needs to be fixed up. Hence the 
> name. The caching fixup is, as you know, a very costly operation.

Now that we know how to flush the GMCH GWB, we can leave pages mapped to
the CPU at all times, so this turns out to be 'free' if we know when
objects have been accessed, which validation flags will tell us. So, my
expectation is that pages would never be unmapped from the CPU just to
be mapped to the GPU for the Intel driver at least. Not to say that
these operations shouldn't be made available, just that they aren't
useful for this driver.

It might be useful to clarify the ttm state names (in particular, it
seems like ttm_unbound and ttm_evicted are reversed), then we could
create functions like 'drm_ttm_unbind' which would move a buffer from
bound to unbound, and drm_ttm_evict which would move a buffer from bound
or unbound to evicted. That might lessen the unexposed ordering
dependencies between the various functions. For instance, calling
drm_ttm_fixup_caching to unbind a buffer won't work unless that buffer
is alread evicted.

> drm_bo_handle_validate() needed to be split for the superioctl, and 
> drm_bo_do_validate() was created. It's possible this can be simplified.

Ok, with superioctl, what I think we want is drm_bo_validate which does:

        error out if the buffer is unfenced
        lock bo mutex
        whack flags
        move the buffer (waiting, if necessary)
        unlock bo mutex

> 1) Intel read/write sync flushes must be issued after the EXE TYPE flag 
> has signaled, so we flush the correct operation. This was implemented 
> and optimized for the case where the kernel needed to evict a texture 
> (read flush) and didn't want to wait for the flush operation to pass 
> down the command queue, effectively causing a command queue flush and 3D 
> engine idle. As discussed before we ended up not using the sync flush 
> code...

My key concern with this code at present is that it requires polling,
where the CPU wakes up frequently and bangs on the hardware. This is
costly in CPU time and power, and also delays application execution
whenever a flush is necessary as the polling code does go briefly idle
between operations.

I'm ok with removing it then; the sync flush code would require that the
CPU poll the graphics engine, something I'm really uninterested in
seeing happen. As an alternative, we can use a separate command queue at
higher priority to flush the GPU, and that can actually give us an
interrupt.

The portion of the code which has me completely puzzled is the use of
the flush_mask and submitted_flush fields. They get modified and used
without any obvious sense. If we could eliminate those and just use the
fence->type (which should be fence->types_waiting) and fence->signaled
(which should be fence->types_signaled), the code would be far more
understandable. 

> 
> 2) Unichrome type hardware: To guarante for example HighQualityVideo 
> engine idle, the EXE type fence must have signaled, and  the engine must 
> signal HQV engine idle using a hardware flag. ..

Is there no interrupt available here? If not, then some kind of polling
is clearly required, but I'd much rather that happen in the Unichrome
driver itself, rather than complicating the general DRM driver
structure.

> So if the caller for example wants to wait for a Intel read type fence 
> to signal, the code must first make sure the EXE flag has signaled, then 
> and only then it indicates to the device dependant code that a flush is 
> needed, and either sleep-waits or polls the device dependant code for 
> flushing to complete

read fencing only requires that we have fences after each operation.
It's the write fencing which is expensive as that requires an MI_FLUSH,
which we'd love to remove from the end of every batch buffer at some
point. However, as described above, we can place the MI_FLUSH in a
separate command stream and run that in the middle of other existing
commands, allowing us to avoid the polling mode of the current code.

> Hope this clarifies things a little.

Yes, as always, additional descriptions of how this code works help a
lot.

-- 
[EMAIL PROTECTED]

Attachment: signature.asc
Description: This is a digitally signed message part

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to