On Fri, 2007-12-21 at 13:21 +0100, Thomas Hellström wrote: > Keith, > This is purely an optimization. > Buffers that are evicted due to a memory aperture space shortage ends up > in a > state where they are unbound from the aperture, but retain the uncached > state assuming the next operation on them will be "validate".
Ok, so we have several primitive operations here and several useful combinations. I think we just need some more consistent naming and documentation to reflect which function performs which primitive operations, and what the intended use cases are. Right now, the function names are not descriptive to me, and there is certainly a lack of documentation in the code. > In that case they are just bound back. However, if they are instead > accessed by the cpu, the caching state needs to be fixed up. Hence the > name. The caching fixup is, as you know, a very costly operation. Now that we know how to flush the GMCH GWB, we can leave pages mapped to the CPU at all times, so this turns out to be 'free' if we know when objects have been accessed, which validation flags will tell us. So, my expectation is that pages would never be unmapped from the CPU just to be mapped to the GPU for the Intel driver at least. Not to say that these operations shouldn't be made available, just that they aren't useful for this driver. It might be useful to clarify the ttm state names (in particular, it seems like ttm_unbound and ttm_evicted are reversed), then we could create functions like 'drm_ttm_unbind' which would move a buffer from bound to unbound, and drm_ttm_evict which would move a buffer from bound or unbound to evicted. That might lessen the unexposed ordering dependencies between the various functions. For instance, calling drm_ttm_fixup_caching to unbind a buffer won't work unless that buffer is alread evicted. > drm_bo_handle_validate() needed to be split for the superioctl, and > drm_bo_do_validate() was created. It's possible this can be simplified. Ok, with superioctl, what I think we want is drm_bo_validate which does: error out if the buffer is unfenced lock bo mutex whack flags move the buffer (waiting, if necessary) unlock bo mutex > 1) Intel read/write sync flushes must be issued after the EXE TYPE flag > has signaled, so we flush the correct operation. This was implemented > and optimized for the case where the kernel needed to evict a texture > (read flush) and didn't want to wait for the flush operation to pass > down the command queue, effectively causing a command queue flush and 3D > engine idle. As discussed before we ended up not using the sync flush > code... My key concern with this code at present is that it requires polling, where the CPU wakes up frequently and bangs on the hardware. This is costly in CPU time and power, and also delays application execution whenever a flush is necessary as the polling code does go briefly idle between operations. I'm ok with removing it then; the sync flush code would require that the CPU poll the graphics engine, something I'm really uninterested in seeing happen. As an alternative, we can use a separate command queue at higher priority to flush the GPU, and that can actually give us an interrupt. The portion of the code which has me completely puzzled is the use of the flush_mask and submitted_flush fields. They get modified and used without any obvious sense. If we could eliminate those and just use the fence->type (which should be fence->types_waiting) and fence->signaled (which should be fence->types_signaled), the code would be far more understandable. > > 2) Unichrome type hardware: To guarante for example HighQualityVideo > engine idle, the EXE type fence must have signaled, and the engine must > signal HQV engine idle using a hardware flag. .. Is there no interrupt available here? If not, then some kind of polling is clearly required, but I'd much rather that happen in the Unichrome driver itself, rather than complicating the general DRM driver structure. > So if the caller for example wants to wait for a Intel read type fence > to signal, the code must first make sure the EXE flag has signaled, then > and only then it indicates to the device dependant code that a flush is > needed, and either sleep-waits or polls the device dependant code for > flushing to complete read fencing only requires that we have fences after each operation. It's the write fencing which is expensive as that requires an MI_FLUSH, which we'd love to remove from the end of every batch buffer at some point. However, as described above, we can place the MI_FLUSH in a separate command stream and run that in the middle of other existing commands, allowing us to avoid the polling mode of the current code. > Hope this clarifies things a little. Yes, as always, additional descriptions of how this code works help a lot. -- [EMAIL PROTECTED]
signature.asc
Description: This is a digitally signed message part
------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
-- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel